Ios Press - Neural Networks For Instrumentation Measurement and Related Industrial Applications PDF
Ios Press - Neural Networks For Instrumentation Measurement and Related Industrial Applications PDF
Ios Press - Neural Networks For Instrumentation Measurement and Related Industrial Applications PDF
IOS Press
Kluwer Academic Publishers
IOS Press
Kluwer Academic Publishers
IOS Press
The NATO Science Series continues the series of books published formerly as the NATO ASI Series.
The NATO Science Programme offers support for collaboration in civil science between scientists of
countries of the Euro-Atlantic Partnership Council. The types of scientific meeting generally supported
are "Advanced Study Institutes" and "Advanced Research Workshops", although other types of
meeting are supported from time to time. The NATO Science Series collects together the results of
these meetings. The meetings are co-organized by scientists from NATO countries and scientists from
NATO's Partner countries - countries of the CIS and Central and Eastern Europe.
Advanced Study Institutes are high-level tutorial courses offering in-depth study of latest advances
in a field.
Advanced Research Workshops are expert meetings aimed at critical assessment of a field, and
identification of directions for future action.
As a consequence of the restructuring of the NATO Science Programme in 1999, the NATO Science
Series has been re-organized and there are currently five sub-series as noted above. Please consult the
following web sites for information on previous volumes published in the series, as well as details of
earlier sub-series:
http://www.nato.int/science
http://www.wkap.nl
http://www.iospress.nl
http://www.wtv-books.de/nato_pco.htm
ISSN: 13876694
Neural Networks
for Instrumentation, Measurement
and Related Industrial Applications
Edited by
Sergey Ablameyko
Institute of Engineering Cybernetics,
National Academy of Sciences of Belarus, Belarus
Liviu Goras
Department of Fundamental Electronics,
Technical University of lasi, Romania
Marco Gori
Department of Information Engineering,
University of Siena, Italy
and
Vincenzo Piuri
Department of Information Technologies,
University of Milan, Italy
IOS
Press
Ohmsha
Publisher
IOS Press
Nieuwe Hemweg 6B
1013BG Amsterdam
Netherlands
fax:+31 206203419
e-mail: [email protected]
Distributor in Japan
Ohmsha, Ltd.
3-1 Kanda Nishiki-cho
Chiyoda-ku, Tokyo 1018460
Japan
fax:+81 332332426
LEGAL NOTICE
The publisher is not responsible for the use which might be made of the following information.
PRINTED IN THE NETHERLANDS
Preface
The aims of this book are to disseminate wider and in-depth theoretical and practical
knowledge about neural networks in measurement, instrumentation and related industrial
applications, to create a clear consciousness about the effectiveness of these techniques as
well as the measurement and instrumentation application problems in industrial
environments, to stimulate the theoretical and applied research both in the neural networks
and in the industrial sectors, and to promote the practical use of these techniques in the
industry.
This book is derived from the exciting and challenging experience of the NATO
Advanced Study Institute on Neural Networks for Instrumentation, Measurement, and
Related Industrial Applications - NIMIA'2001, held in Crema, Italy, from 9 to 20 October
2001. During this meeting the lecturers and the attendees had the opportunity of learning
and discussing the theoretical foundations and the practical use of neural technologies for
measurement systems and industrial applications. This book aims to expand the audience of
this meeting for wider and more durable benefits.
The editors of this book are very grateful to the lecturers of NIMIA'2001, who greatly
contributed to the success of the meeting and to making this book an outstanding starting
point for further dissemination of the meeting achievements.
The editors would also like to thank NATO for having generously sponsored
NEVIA'2001 and the publication of this book. Special thanks are due to Dr. F. Pedrazzini,
the PEST Programme Director, for his highly valuable suggestions and guidance in
organizing and running the meeting.
A final thank you to the staff at IOS Press, who made the realization of this book much
easier.
The Editors
Sergey ABLAMEYKO
Institute of Engineering Cybernetics, National Academy of Sciences of Belarus
Surganova Str. 6, 220012 Minsk, Belarus
Liviu GORAS
Department of Fundamental of Electronics, Technical University of lasi
Copou Blvd II, 6600 lasi, Romania
Marco GORI
Department of Information Engineering, Universita' degli Studi di Siena
via Roma 56, 53100 Siena, Italy
Vincenzo PIURI
Department of Information Technologies, University of Milan
via Bramante 65, 26013 Crema, Italy
Acknowledgements
The ASI NIMA'2001 was sponsored by
NATO - North-Atlantic Treaty Organization (Grant No. PST.ASI.977440)
and organized with the technical cooperation of
IEEE I&MS - IEEE Instrumentation and Measurement Society
IEEE NNC - IEEE Neural Network Council,
INNS - International Neural Network Society
ENNS - European Neural Network Society
LAPR TC3 - International Association for Pattern Recognition - Technical Committee
on Neural Networks & Computational Intelligence
EUREL - Convention of National Societies of Electrical Engineers of Europe
AEI - Italian Association of Electrical and Electronic Engineers
SIREN - Italian Association for Neural Networks
APIA - Italian Association for Artificial Intelligence
UNIMI DTI - University of Milan - Department of Information Technologies
Contents
Preface
1.
1.1
1.2
1.3
1.4
1.5
2.
2.1
2.2
2.3
2.4
2.5
2.6
2.7
3.
3.1
3.2
3.3
1
1
2
3
3
6
9
9
10
11
12
15
16
17
19
19
20
28
4.
43
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Introduction
The main steps of modeling
Black box model structures
Neural networks
Static neural network architectures
Dynamic neural architectures
Model parameter estimation, neural network training
Model validation
Why neural networks?
43
44
49
50
51
54
58
62
68
4.10
4.11
5.
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
Neural control
Neural approximations
Gradient algebra
Neural modeling of dynamical systems
Stabilization
Tracking
Optimal control
Reinforcement learning
Concluding remarks
6.
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
7.
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
7.12
7.13
69
77
79
79
82
85
90
96
101
106
110
114
119
Introduction
Multilayer neural networks
Dynamical systems
How can we verify if the behavior is chaotic?
Embedding parameters
Lyapunov's exponents
A neural network approach to compute the Lyapunov's exponents
Prediction of chaotic processes by using neural networks
State space reconstruction
Conclusion
119
122
123
126
128
132
134
138
140
143
\ 45
Introduction
Digital imaging systems
Image system design parameters and modeling
Multisensor image classification
Pattern recognition and classification
Image shape and texture analysis
Image compression
Nonlinear neural networks for image compression
Linear neural networks for image compression
Image segmentation
Image restoration
Applications
Future research directions
145
146
148
148
149
152
153
155
155
155
156
156
160
9.
9.1
9.2
9.3
10.
10.1
10.2
10.3
10.4
10.5
11.
11.1
11.2
11.3
12.
12.1
12.2
12.3
12.4
12.5
\ 67
167
170
172
175
185
\ 89
189
197
207
219
219
220
223
228
236
249
249
257
263
273
273
275
276
282
288
13.
13.1
13.2
13.3
13.4
Neural Networks in the Medical Field, Marco Parvis and Alberto Vallan
Introduction
Role of neural networks in the medical field
Prediction of the output uncertainty of a neural network
Examples of applications of neural networks to the medical field
Index
Author Index
291
291
291
299
312
323
329
Chapter 1
Introduction to Neural Networks
for Instrumentation, Measurement,
and Industrial Applications
Vincenzo PIURI
Department of Information Technologies, University of Milan
via Bramante 65, 26013 Crema, Italy
Sergey ABLAMEYKO
Institute of Engineering Cybernetics, National Academy of Sciences of Belarus
Surganova Str. 6, 220012 Minsk, Belarus
the aim of the meeting was in fact not limited only to the direct interaction with the
attendees, but directed also to bring this knowledge to the attention of a world-wide
audience.
1.3. The book organization
Like NIMIA'2001, this book presents the basic issues concerning the neural networks for
sensors and measurement systems, for identification in instrumentation and measurement,
for instrumentation and measurement dedicated to system and plant control, and for signal
and image processing in instrumentation and measurement. The underlying and unifying
wire of the presentation is the interdisciplinary and comprehensive point of view of the
metrological perspective. Besides, it focus on the use, the benefits, and the problems of
neural technologies in instrumentation and measurement for some relevant application
areas. This allows for a vertical analysis in the specific industrial area, encompassing
different theoretical, technological, and implementation aspects: the specific application
areas of instrumentation and measurement based on neural technologies are diagnosis,
robotics, laser processing, electrical measurement systems, virtual environments, and
medical systems.
Each chapter focuses on a specific topic. Presentation starts from the basic issues, the
techniques, the design methodologies, and the application problems. First it tackles the
theoretical and practical issues concerning the use of neural networks to enhance quality,
characteristics, and performance of the traditional approaches and solutions. Then, it
provides an overview of the industrial relevance and impact of the neural techniques by
means of a structured presentation of several industrial examples.
The program structure of NIMIA'2001 made it a unique and successful forum for
interactive discussion directed to higher dissemination of innovative knowledge,
stimulation of interdisciplinary research as well as application, better understanding of the
technological opportunities, advancement of the educational consciousness about the
relevance of the metrological aspects for applicability to industry, promotion of the
practical use of these techniques in the industry, and overall advancement of industry and
products. Each and every participant had his own contribution from his specific knowledge
to bring to the scientific and practitioner communities for mutual benefit and synergy.
This book aims to extend these benefits to all experts in the neural network areas as well
as in metrology and in the industrial applications, for mutual sharing of in-deepth
interdisciplinary knowledge and to support further advancements both of the neural
disciplines and the industrial application opportunities.
1.4. The book topics
From the NIMIA'2001 experience, this book tackles some of the most relevant areas in the
use of neural networks for advanced instrumentation, measurement procedures and related
industrial applications.
The first six chapters are dedicated to general issues and methodologies for the use of
neural networks in any application area: namely, sensors and measurement systems, system
identification, system control, signal processing, and image processing.
The first and basic issue to understand the significance and the usefulness of any
quantity observed in a system consists of characterizing that quantity from the metrological
point of view. This is the target of Chapter 2. The analysis of sensors, transducers,
On the basis of these general technologies and methodologies, some specific application
areas are then discussed in detail: namely, diagnosis, robotics, industrial laser processing,
electrical and dielectrical applications, virtual environments, and medical applications.
These cases have particular relevance from the industrial point of view since they constitute
the leading edge for many manufacturing processes and are promising solutions for today
and future applications.
System diagnosis is a recent application area that largely benefit from the inference and
generalization mechanisms provided by the neural networks. Chapter 8 tackles this
application area. A non-intrusive approach based on signal and image processing to detect
the presence of end-of-production defects and operating-life faults as well as to classify
them is highly beneficial for many industrial applications both to enhance the quality of
production processes and products, e.g., in avionics, automotive, mechanics, and
electronics. The basic issues of using neural networks to create high-level sensors in this
application area are shown and evaluated with respect to conventional approaches.
Robotics has many opportunities to make use of neural networks to tackle some major
problems concerning sensing and the related applications like control, signal and image
processing, vision, motion planning, and multi-agent coordination. Chapter 9 is dedicated to
this area. The neural techniques are well suited for the non-linearity of these tasks as well as
the need of adaptation to unknown scenarios. The integrated use of these methods also in
conjunction with conventional components was discussed and evaluated. Evolutionary and
adaptive solutions will make even more attractive the use of robotic systems in industry and
in the daily life (domotics and elder/disabled people assistance), especially whenever the
operating environment is partially or largely unknown.
Industrial laser processing is an innovative production process for many application
fields. The undoubted superior quality of laser cutting, drilling, and welding with respect to
conventional processes makes this technology highly appreciated in high-technology
industries (e.g., electronics) as well as in mass production (e.g., mechanical industry,
automotive). The problems related to real-time control the laser processing and to quality
monitoring are discussed in Chapter 10. The use of neural techniques is presented as a
highly innovative solution that outperforms other approaches thanks to intrinsic adaptivity
and generalization ability.
Electrical and dielectrical applications are one of the fields in which neural technologies
were widely and successfully used since some years. Chapter 11 is dedicated to this topic.
Electric signal analysis is important to evaluate the quality and the behavior of power
supply and, consequently, to monitor and control power plants and distribution networks.
Prediction of power load is another application that benefits from neural prediction ability
to foresee the expected power needs and act in advance on power generators and
distribution. Signal analysis is an innovative aspect of monitoring, control and diagnosis for
electric engines and transformers. Observation of partial discharges in dielectrical materials
and systems is relevant to guarantee the correct operation of capacitors and insulators.
These aspects are widely discussed and compared with conventional approaches in the
chapter.
Virtual environments are one of the most recent areas that are becoming important in the
industrial and economic scenario. They can be used for simulated reality, e.g., in
telecommunication (e.g., videoconferencing), training on complex systems, complex
system design (e.g., or robotic systems), electronic commerce, interactive video,
entertainment, and remote medical diagnosis and surgery. Adaptivity and generalization
ability of neural networks allow for introducing advanced features in these environments
and to cope with non-linear aspects, dynamic variations of the operating conditions, and
evolving environments. The use of neural networks and their benefits are analyzed and
evaluated in Chapter 12.
Medical applications had and will have great expansion by using adaptive solutions
based on neural networks. In fact it is relatively easy to collect examples for many of these
applications, while it is practically impossible to derive a conventional algorithm having the
same efficiency and accuracy. Neural networks are able to analyze biomedical signals, e.g.,
in electrocardiogram, encephalogram, breath monitoring, and neural system. Feature
extraction and prediction by neural networks are relevant tools to monitor and foresee
human conditions for advanced health care. Neural image analysis can be used for image
reconstruction and enhancement. Prosthesis include neural component to provide a more
natural behavior; artificial senses (hearing, vision, odor, taste, tact) can be also exploited in
robotics and industrial applications. Diagnostic equipment made impressive advancements
especially by using signal and image processing for non-intrusive scanning. These are the
main cases considered and discussed in Chapter 13.
1.5. The socio-economical implications
Training researchers and practitioners from several theoretical and application areas on
neural networks for measurement, instrumentation and related industrial applications is
important since these topics have and will have a major role in developing new theoretical
background as well as further scientific advancement and implementation of new practical
solutions, encompassing -among many others- embedded systems and intelligent
manufacturing systems.
Training of researchers and practitioners is an investment for the advancement of
science and industry that will be paid back in the near future by the technological
advancement in knowledge, production processes, and products. This will allow in fact to
maintain, to expand or even to achieve a leading role in the international scenario. From
this training will in particular benefit the less favorite economic areas: coming in contact
with the leading experts and the most advanced technologies will be useful for the
economic and industrial advancement, for enhancing their worldwide competitiveness, and
for creating new job opportunities.
NIMIA'2001 and this book aim to highly contribute to the above goals. NIMIA'2001
had high relevance for training researchers and practitioners since leading scientists and
practitioners were gathered from around the world. This allowed the attendees to have wide
and in-depth scientific and technical discussions with them for a better understanding of
innovative topics and sharing of innovative knowledge. The authors and the editors of this
book wish that it can be useful to much more people around the world.
The increasing industrial interest and the possibility of successful industrial application
of soft computing technologies for advanced products and enhanced production processes
provide a great opportunity to highly trained researchers and practitioners to find a job or
enhance their position. A better understanding and knowledge about the book topics will
result in better opportunities for developing the industry, for expanding the employment,
and for enhancing the employment quality and remuneration. The authors and the editors
wish that this book will have therefore a great impact on the career of researchers and
practitioners, especially of the young ones.
Continuous education and worldwide dissemination are additional issues that need to be
considered in order to enhance and expand the benefits provided by higher training in the
topics of this book. NIMIA'2001 was the starting point that allowed for coordinating,
homogenizing, and consolidating educational efforts on neural technologies for
[19]
[20]
[21]
[22]
Chapter 2
The Fundamentals
of Measurement Techniques
Alessandro FERRERO
Department of Electrical Engineering, Politecnico di Milano
piazza L. da Vinci 32, 20133 Milano, Italy
Renzo MARCHESI
Department of Energetics, Politecnico di Milano
piazza L. da Vinci 32, 20133 Milano, Italy
Abstract The experimental knowledge is the basis of the modern approach to all
fields of science and technique, and the measurement activity represents the way this
knowledge can be obtained. In this respect the qualification of the measurement
results is the most critical point of any experimental approach. This paper provides
the very fundamental definitions of the measurement science and covers the
methods presently employed to qualify, from the metrological point of view, the
result of a measurement. Reference is made to the recommendations presently issued
by the International Standard Organizations.
10
Figure 1: Representation of the measurement process together with the five agents that take part in it.
possible lack of calibration, its age, and a number of other different reasons still related to
the non ideality of the instrument.
Similarly, the measurement method is usually based on the exploitation of a single
physical phenomenon, whilst other phenomena may interfere with the considered one, and
alter the result of the measurement in such a way that the "true" value cannot be obtained.
At last, the operator is also supposed to contribute in making the result of the
measurement different from the expected "true" value because of several reasons, such as,
for instance, his insufficient training, an incorrect reading of the instrument indication, an
incorrect post processing of the readings, and so on.
The effects of this non-ideal behavior of the agents that take part in the measurement
process can be easily experienced by repeating the same measurement procedure a number
of times: the results of such measurements always differ from each other, even if the
measurement conditions are not changed. Moreover, if the measurement is repeated by
another operator, reproducing the same measurement conditions somewhere else, different
results are obtained again. If the "true" measurement result is represented as the center of a
target, as in Fig. 2, each different result of a measurement is represented as a different
shoot, and measurements done by different operators under slightly different conditions can
be represented as two different burst patterns on the target.
As a matter of fact, this means that expressing the result of a measurement with a single
number (together with the measurement unit) is totally meaningless, because this single
number cannot be supposed to represent the measured quantity in a better way than any
other result obtained by repeated measurements.
Moreover, since the same result can be barely obtained as the result of a new
measurement, there is no way to compare the measurement results, because they are
generally always different.
This represents an unacceptable limitation of the measurement practice, since the final
aim of any measurement activity is the quantitative comparison: this is not only true when
technical and scientific issues are involved, where the results of measurements are
compared in order to asses whether a component meets the technical specifications or not,
or a theory represents a physical phenomenon in the correct way or not, but also when
commercial and legal issue are involved, where quantities and qualities of goods have to be
compared, or penalties have to be issued if a tolerance level is passed, and so on.
2.3. The uncertainty concept
The problem outlined in the previous section has been well known since the origin of the
measurement practice, and an attempt of solution was provided, in the past, by considering
12
the measurement error as the difference between the actual measured value and the "true"
value of the measurand. However this approach is "philosophically" incorrect, since the
"true" value cannot be known.
To overcome this further problem, the uncertainty concept has been introduced in the
late 80's as a quantifiable attribute of the measurement, able to assess the quality of the
measurement process and result. This concept comes from the awareness that when all the
known or suspected components of error have been evaluated, and the appropriate
corrections have been applied, there still remains an uncertainty about the correctness of the
stated results, that is, a doubt about how well the result of the measurement represents the
value of the quantity being measured [1].
This concept can be more precisely perceived if three general requirements are
considered.
1. The method for evaluating and expressing the uncertainty of the result of a measurement
should be universal, that is, it should be applicable to all kinds of measurements and all
types of input data used in measurements.
2. The actual quantity used to express the uncertainty should be internally consistent and
transferable. The internal consistency means that the uncertainty should be directly
derived from the components that contribute to it, as well as independently on how these
components are grouped, or on the decomposition of the components into
subcomponents. As for transferability, it should be possible to use directly the
uncertainty evaluated for one result as a component in evaluating the uncertainty of
another measurement in which the first result is used.
3. The method for evaluating and expressing the uncertainty of a measurement should be
capable of providing a confidence interval, that is an interval about the measurement
result within which the values that could reasonably be attributed to the measurand may
be expected to lie with a given level of confidence.
In 1992, the International Organization for Standardization (ISO) provided a well
pondered answer to these requirements by issuing the Guide to the Expression of
Uncertainty in Measurement [1], where the concept of uncertainty is defined, and operative
prescriptions are given on how to estimate the uncertainty of the result of a measurement in
agreement with the above requirements. More recently the Guide has been encompassed in
several Standards, issued by the International (IEC) and National (UNI-CEI, DIN, AFNOR)
Standard Organizations.
2.4. Uncertainty: definitions and methods for its determination
The ISO Guide defines the uncertainty of the result of a measurement as a parameter,
associated with the result itself, that characterizes the dispersion of the values that could
reasonably be attributed to the measurand.
The adverb "reasonably" is the key point of this definition, because it leaves a large
amount of discretionary power to the operator, but it does not exempt him from following
some basic guidelines that come from the state of the art of the measurement science.
These guidelines are provided by the ISO Guide itself, which outlines two different
ways for expressing the uncertainty.
The first way considers the uncertainty of the result of a measurement as expressed by a
standard deviation, or a given multiple of it. This means that the distribution of the possible
measurement result is known, or assumptions can be made on it. If, for example, the results
of a measurement are supposed to be distributed according to a normal distribution about
the mean value x , as shown in Fig. 3, the uncertainty can be expressed by the distribution
standard deviation o. This means that the probability that a measured value falls within the
13
interval (x-a,x + a) is the 68.3%. The uncertainty can be also expressed by a multiple 3d
of the standard deviation, so that the probability that a measured value falls within the
interval (x-3a,3c + 3a) climbs up to the 99.7%. This example shows that the third
requirement in the previous section is satisfied, since it is possible to derive a confidence
interval, with a given confidence level, from the estimated value of the uncertainty.
The second way considers the uncertainty as a confidence interval about the measured
value, as shown in Fig. 4. This method is very often employed to specify the accuracy of a
digital multimeter, and the width of the confidence interval is given as a = z% of reading +
y% of full scale.
When the uncertainty of the measurement result x is expressed as a standard deviation it
is called "standard uncertainty" and is written with the notation u(x).
As far as the evaluation of the uncertainty components is concerned, the ISO Guide
suggests that some components may be evaluated from the statistical distribution of the
results of series of measurements and can be characterized by experimental standard
deviations. Of course, this method can be applied whenever a significant number of
measurement results can be obtained, by repeating the measurement procedure under the
same measurement conditions.
The evaluation of the standard uncertainty by means of the statistical analysis of a series
of observations is defined by the ISO Guide as the "type A evaluation".
Other components of uncertainty may be evaluated from assumed probability
distributions, where the assumption may be based on experience or other information.
These components are also characterized by the standard deviation of the assumed
distribution. This method is applied when the measurement procedure cannot be repeated or
when the confidence interval about the measurement result is a priori known, i.e. by means
of calibration results.
14
The evaluation of the standard uncertainty by means other than the statistical analysis of
a series of observations is defined by the ISO Guide as the "type B evaluation".
When the uncertainty is requested to represent an interval about the result of a
measurement within which the values that could reasonably be attributed to the measurand
are expected to lie with a given level of confidence, then the expanded uncertainty U is
defined as the product of the standard uncertainty u(x) by a suitable integer K that is called
coverage factor:
U = K.u(x)
(1)
Of course, the association of a specific level of confidence to the interval defined by the
expanded uncertainty requires that explicit or implicit assumptions are made regarding the
probability distribution of the measurement results. The level of confidence that may be
attributed to this interval can be known only to the extent to which such assumptions may
be justified.
All above considerations have been derived for the direct measurement of a single
quantity and apply to the results of such a measurement. Quite often, however, the value of
a quantity to be measured is obtained from a mathematical computation of the results of
other measurements.
According to the second requirement reported in the above section 3, the uncertainty
that has to be associated with the result of such a measurement should be obtained from the
uncertainty values associated to the single measurement results employed in the evaluation
of the measurand. The ISO Guide defines such uncertainty value as the "combined standard
uncertainty", that is the "standard uncertainty of the result of a measurement when that
result is obtained from the values of a number of other quantities, equal to the positive
square root of a sum of terms, being the variances or covariances of these other quantities
weighted according to how the measurement result varies with changes in these quantities".
Such a definition can be easily expressed with a mathematical equation when the result
y of a measurement depends on N other results xi, 1 < i < N, of measurements, according to
the relationship:
Under this assumption, the combined standard uncertainty associated with y is given by:
(3)
where u(xi) is the standard uncertainty associated with the measurement result xi, and
u(xi, xj) = u(xj, xi) is the estimated covariance of xi against xj.
If the degree of correlation between xi and xj is expressed in terms of the correlation
coefficient:
(4)
where r(xi,xj) =
r(xj,xi)
<
If the measurement results xi and xj are totally uncorrelated, then r(xi, xj) = 0 and
therefore the combined standard uncertainty is given by:
15
On the contrary, if the measurement results xi and xj are totally correlated, then r(xi, xj) = 1.
The effect of the correlation on the uncertainty estimation can be fully perceived if the
following example is considered.
Let us suppose that the electric power consumed by a dc load is measured as P = VI,
where V is the supply voltage and / is the current flowing through the load. Let us also
suppose that V and / are measured by two independent DVMs, the measured value for the
voltage is V = 100 V, with a standard uncertainty u(V) = 0.2 V, and the measured value for
the current is / = 2 A, with a standard uncertainty u(I) - 0.01 A.
Since two independent DVMs have -been considered for both voltage and current
measurements, the correlation coefficient is r(V, /) = 0 and hence equation (6) can be used
for the evaluation of the uncertainty associated with the measured value P = 200 W for the
electric power.
It is:
and therefore the combined standard uncertainty provided by (6) is uc(P) = 1.08 W.
Let us now suppose that the same DVM is used for both the voltage and current
measurements, and that the uncertainty values associated to the measured values of voltage
and current are exactly the same as those estimated for the previous situation. In this case
the measurement are totally correlated, since the same instrument has been used. The
correlation coefficient is hence r(V, /) = 1, equation (5) must be used and therefore the
combined standard uncertainty associated with the measured value of P is uc(P) = 9.35 W.
The effect of an incorrect estimation of the correlation is quite evident.
2.5. How can the results of different measurements be compared?
One of the most important reasons for introducing the concept of uncertainty in
measurement recalled in the previous sections is the need for comparing the results of
different measurements of the same quantity. This is a quite critical problem, which is not
confined to the technical field, but involves also commercial and legal issues whenever the
same quantity has to be evaluated in different places in order to assess, for instance, if the
delivered goods meet the specifications provided in the purchase order.
It is quite evident that the uncertainty associated with the different measurement results
plays a fundamental role, since it provides confidence intervals within which the value that
could be reasonably attributed to the measurand is expected to lie: it can be immediately
recognized that the results of two different measurements of the same quantity can be
considered as equal if the two confidence intervals defined by their uncertainty values are at
least overlapping. Fig. 5 shows this concept.
In this figure the terms "compatible" and "not compatible" are used since they are
generally employed instead of "equal" and "different"; in fact, the values of the
measurement results can be never considered as equal or different in a strict mathematical
sense. However, if the analysis of the measurement uncertainty shows that two results of
two different measurements belong to the same confidence interval about the expected
value of the measurand, the same results are considered as "compatible".
16
not compatible
compatible
Figure 5: Example of compatible (x1 and x2) and non compatible(x1and x3) measurement results,
based on the fact that the confidence interval provided by the estimated uncertainty values
are (partially) overlapping or not.
The analysis of the confidence intervals based simply on their partial overlapping in
order to assess whether two measurements are compatible or not may still lead to
ambiguous situations. The most common situation is that of three measurements, x1, x2, x3,
with the confidence interval about xt partially overlapping the confidence interval about x2,
and this confidence interval partially overlapping the confidence interval about x3, but in
such a way that the interval about x1, is not overlapping the confidence interval about x3 at
all. This situation shows that x1, is compatible with x2, x2 is compatible with x3, but xt is not
compatible with x3 If x1, and x3 are not compared directly, but only through a comparison
with x2, they can be supposed to be compatible, while they are not.
In order to overcome such a problem, a new definition of compatibility is being
proposed, that is becoming more and more popular among the calibration laboratories. This
definition states that two measurement results x1, and x2, associated with the standard
uncertainty values u(x1) and u(x2) respectively, are considered compatible if:
(7)
where r(x1, x2) is the correlation factor between x1 and x2 and K is the employed coverage
factor.
By comparing (7) and (5), it can be readily checked that (7) represents the combined
expanded uncertainty associated with |x1 - x 2 |. Therefore, the two results are considered
compatible when their distance is lower than the combined expanded uncertainty with
which this distance can be estimated.
2.6. The role of the standard and the traceability concept
The concepts explained in the previous sections show the meaning of uncertainty in
measurement and provide a few guidelines for estimating the uncertainty and compare the
results of different measurements. However, one main question appears to be still open:
how can be granted that the measurement result, together with the associated uncertainty
value, do really characterize "the dispersion of values that could reasonably be attributed to
the measurand"?
Indeed, the analysed procedures are mainly statistical computations, based on the
assumption that the possible results of the measurement are distributed according to a given
probability density function. This assumption is in turn based on experimental evidence or a
priori knowledge, but cannot generally grant that the actual value of the measurand lies
within the assumed distribution with the given confidence level.
The solution to this problem is found in the correct involvement of the standard in the
measurement procedure, as shown in Fig. 1. In fact, if the result of a measurement is
compared with the value of the standard, it is possible the state whether the result itself is
17
compatible with actual value of the measurand (that is the actual value lies within the
confidence interval provided by the estimated uncertainty) or not, and should hence be
discarded.
The procedure that allows to compare the result of a measurement with the value of the
standard is called "calibration".
The calibration can be done, of course, by direct comparison with the standard. Though
this is the most accurate way to calibrate a measurement device, it is generally expensive
and subject to long "waiting lists", due to the low number of standards available.
Furthermore, standards are not always available for every measured quantity, and therefore
the measurement result must be traced back to the values of the available standards.
An alternate calibration way is the comparison of the measurement result with the one
provided by another calibrated measurement device. Of course, since an indirect
comparison is performed, the uncertainty that can be assigned to the results provided by a
measurement device calibrated in such a way is higher than the one that could be assigned
by direct comparison with the value of the standard.
When this indirect calibration is adopted, several steps could be done before finding the
direct comparison with the value of the standard: of course, the more the steps are, the
higher is the uncertainty value. The property of the result of a measurement to be traced
back to a standard, no matter if in a direct or indirect way, is called the "measurement
traceability".
The traceability is a strict requirement when the results of different measurements
performed on the same quantity with different instruments and methods have to be
compared. This is the only way to assess whether the results are actually compatible or not.
The compliance with this requirement has a great importance also from the commercial
and legal point of view. In fact, since all national standards are compatible with each other,
when the result of a measurement is traced to its national standard, it is also traced to the
standards of any other Country whose standard is recognized by the International Standard
Organization. This avoids, for instance, the need for doubling the measurement procedures
in commercial transactions.
2.7. Conclusions
The very fundamental concepts of the measurement technique have been briefly reported in
this paper. The key role played by the uncertainty concept has been emphasized as the only
possible way to characterize the result of a measurement and define a confidence interval
within which the value that could reasonably be attributed to the measurand is expected to
lie.
The guidelines provided by the ISO Guide to the Expression of Uncertainty in
Measurement [1] for the estimation of the uncertainty have been shortly recalled and
discussed.
Indications on how to take into account the estimated uncertainty values for comparing
measurement results have been reported and discussed as well, so that the very
fundamentals of the experimental approach to signal and information processing have been
covered in the paper.
References
[1] BIPM, LEG, IFCC, ISO, IUPAC, OIML, Guide to the Expression of Uncertainty in Measurement, 1993.
19
Chapter 3
Neural Networks in Intelligent Sensors
and Measurement Systems
for Industrial Applications
Stefano FERRARI, Vincenzo PIURI
Department of Information Technologies, University of Milan
via Bramante 65, 26013 Crema, Italy
Abstract. This chapter discusses the basic concepts of intelligent instrumentation and
measurement systems based on the use of neural networks. The concept of intelligent
measurement is introduced as a preliminary step in industrial applications to extract
information concerning the monitored or controlled system or plant as well as the
surrounding environment. Implementation of intelligent measurement systems
encompassing neural components is tackled, by providing a comprehensive approach
to optimum system design. Issues and examples concerning the use of neural networks
in intelligent sensing and measurement systems are discussed. The main objective is to
show the feasibility and the usability of these techniques to implement a wide variety
of adaptive sensors as well as to create high-level sensing systems able to extract
abstract measures from physical data, with special emphasis on industrial applications.
20
and to realize measurement systems that are able to create comprehensive views of the
monitored system by intelligent sensor fusion and adaptation [8]. For an introduction to the
neural computation, refer to [2-7]: in the sequel of the book, the reader is assumed to be
rather familiar with the basic concepts of neural networks.
In Section 3.2 the design issues, technologies and problems are discussed to provide a
comprehensive view of the interacting goals and characteristics that need to be carefully
balanced for an optimum implementation of an intelligent measurement system. Hardware
and software solutions are presented. A comprehensive design methodology is then
introduced. In Section 3.3 the practical use of the neural paradigms is discussed in several
application cases for intelligent sensors and measurement systems, as a fundamental basis
for any industrial applications. Approaches available in the literature are analyzed to show
the effectiveness and the efficiency of the neural-based approaches for the given application
constraints.
21
connected to all other neurons by weighted links through which its outputs are presented as
inputs to the receiving neurons; inputs from the external environment are delivered to all
neurons. Memory elements are introduced at the neuron's inputs to allow for memorizing
the dynamic behavior of the system. The neural computation is therefore parametric in the
number of neurons, the memory elements, the non-linear functions, and the interconnection
weights. The neural computation is expected to approximate as best as possible the desired
(static or dynamic) behavior described by a set of examples. This view allows for defining a
mathematical approach to the identification of the optimum neural computation that solves
the envisioned application: the problem could be in fact stated as a functional. The solution
of the functional is the best neural computation for the given application problem.
Constraints on the system characteristics can be defined so that solution of the functional
will be constrained. Unfortunately, this approach is not practically feasible since the
optimization space is too huge: the exploration will take an unacceptably long time.
The neural computation needs therefore to be defined in a more efficient way through a
sequence of steps that explore the alternatives by exploiting the available knowledge
cumulated by researchers and practitioners around the world along the past twenty years.
To achieve this goal we start from the desired behavior, as defined by the available
examples, and the application constraints (e.g., concerning accuracy, uncertainty, power
consumption, economical cost, etc.).
First of all, the most appropriate neural paradigm must be identified among the wide
spectrum of neural families proposed in the literature. In particular, the overall topology of
the network and the internal structure of the neurons must be selected. In the case different
alternatives have been shown effective in cases similar to the envisioned application, all of
them should be explored in the subsequent steps to finally achieve the most suited solution.
Selection is in fact usually not immediately feasible at this initial design stage since detailed
characteristics and constraints need to be taken into account; besides, an accurate evaluation
of the performance can be done only when the actual implementation has been selected. For
example, feed-forward neural structures can be adopted in all applications in which a
mathematical function needs to be approximated or for classification when input-output
examples are available. Feedback networks are appropriate for modeling dynamic
behaviors, e.g., in control applications, by using a feed-forward structure with a feedback
loop which supplies the past history to the network inputs through memory elements. Selforganizing maps are effective for classification when classes are not a-priori defined. The
sigmoid function to generate the neuron's output is one of the widely used in theoretical
research; in the practice approximated versions outperform the theoretical sigmoid as
computation power is concerned.
Second, the most appropriate network model must be chosen within the selected family
by defining the structural characteristics of the model. Namely, we need to identify the
number of neurons in the network and, in the case of dynamic systems, the length of
memory history. Experience can be useful to make these selections. A theoretical
framework should consider the complexity of the application problem as defined by the set
of examples that characterize the desired behavior. In the literature, some methodological
guidelines have been presented to dimension the network [11,12], also by taking into
account the quantity and the distribution of examples over the field of the desired behavior.
In general, the typical approach is based on tentative cases having different network size
and on the analysis of the accuracy achieved in their outputs: from the literature a promising
range is foresee, then experiments will lead to subsequent refinements by focusing the
attention on the most attracting sub-ranges till the probable optimum structure. Similarly,
we should operate to identify the number of memory elements required to hold the system
history. It is important to point out that the trial-and-error approach that is used to configure
22
completely the network requires to evaluate the accuracy of the outputs and the other
characteristics of the model (e.g., the generalization ability). Consequently, the optimum
dimension of the neural network depends on the optimum configuration of the network
weights that is achieved at the end of the configuration procedure for the envisioned
network structure. To break this loop we need therefore to adopt an iterative approach: we
have to complete the configuration by assuming that the network under consideration has
the optimum size and, then, go back to evaluate if such network was actually optimum.
The third step consists of configuring the neural network interconnection weights by
learning the desired behavior either by a supervised or an unsupervised training procedure.
Many techniques were developed in the literature for the different neural models [2-7]. For
example, several variations of the back-propagation algorithm were experimented for the
feed-forward networks. Extensions for feedback network were also studied. Self-adaptation
was proposed for self-organizing maps. Selection of the most suited learning approach can
be performed by searching in the best results presented in the literature for the envisioned
model family and application. Learning must be configured to take into account the actual
characteristics of the implementation that will be adopted. For example, possible
approximations of the theoretical non-linear functions, that are adopted to achieve a better
implementation (e.g., from the point of view of the circuit complexity and power
consumption in the case dedicated hardware solutions, or the computation complexity in the
case of software realizations), must be considered also in training to create a consistent
solution. Large network errors and even convergence problems in dynamic systems may be
in fact induced in the application system during in the operating life by having trained the
neural model with ideal conditions and, then, by having applied the approximations. This is
the typical case that occurs when training is performed by using a theoretical sigmoid,
while a multi-step function is adopted in the real system.
In the fourth step, the training procedure is applied to configure the operational
parameters of the network model. Two basic issues must be carefully considered since they
greatly affect the quality of the network and, consequently, the accuracy of the outputs:
which data should be used for training and how long learning should be continued. In many
real applications the examples of the desired behavior are available only in a limited
quantity. Often it may be not easy or cheap to collect these examples for different reasons:
for example, in some cases running the physical experiments to collect the data may be
economically expensive, sometimes there is no personnel available enough to do the tests,
in other cases the production cannot be suspended to perform experimental runs, and some
operating conditions may be difficult to apply. When a limited set of data is available, it
must be split in two parts: one to actually perform training, the second to validate the
training result (i.e., the characteristics of the network such as the generalization ability, the
robustness, and the accuracy). The validation data should never be used for training in order
to have an impartial evaluation; using training data for validation will result in an optimistic
- sometimes too much - evaluation of the network abilities. However, the less training data
are collected, the lower is the quality of training and the higher is the network error in
generating the desired outputs. Some additional guidelines can be found in the literature to
deal with these issues and to evaluate the related network accuracy, e.g., see [13]. Duration
of training is critical as well. In fact, if learning is too prolonged the network tends to learn
the examples too much and to loose the generalization ability. Training should be applied
till the network error decreases when test examples are presented: when the error becomes
steady, training should be terminated. In the case of periodic or continuous learning, the
procedure and the network configuration update must be controlled so as to allow for a high
generalization ability and accuracy. By analyzing the neural model and the validation data,
we can derive also the confidence that we can have on the computation outputs [14].
23
More detailed guidelines to create the neural paradigms can be found in the following
chapters with specific reference to the envisioned specifications and application areas.
After the previous steps, we obtain a configured neural paradigm that is able to solve the
envisioned application problem, possibly with the desired accuracy and uncertainty. It is
worth nothing that the configured neural network is an algorithm, since it defines exactly
sequence of all operations and all operand values required to generate the network outputs
from the current input data. When configured, the computation of each neuron is in fact a
weighted summation followed by a non-linear function, while the topology of the neural
network defines the activation order of the neurons' computation and the data flow. The
difference of neural paradigms with conventional algorithmic approaches consists of the
fact that the algorithm designer has to define the sequence of operation to solve the
application problem, while the neural designer has only to select the computational model
and learning identifies the exact sequence of operations from the behavior examples.
In several application cases, neural solutions have been shown superior to algorithmic
approaches, when the design and environmental conditions discussed at the beginning of
this section apply. In many other cases efficiency and accuracy of algorithms remain
outstanding. However, there are several cases in which a suited combination of the
characteristics and properties of both of these computational approaches may lead to more
advanced solutions. The efficiency of algorithms to tackle specific tasks for which they are
known and effective can be in fact merged with the adaptivity and the generalization ability
from examples of the neural paradigms. This results in the composite systems [9]. In
composite systems the computation is partitioned in algorithmic and neural components to
exploit the best features of each of these approaches. From the high-level functional
description of the application and the related constraints it is therefore necessary to perform
appropriate analysis of the desired behavior to partition the application system and to derive
the high-level description of each algorithmic and neural component. Then, learning allows
for configuring each neural component so as to create its final algorithmic description. The
resulting high-level description of the whole system consists thus of the collection of the
algorithmic description of all components, independently from the way in which the
designer initially described each of them.
3.2.2 Design of the neural implementation
The second complex task for the designer is now the identification of the most suited
solution for implementing the neural computation (or the composite system) that has been
reduced to an algorithmic description for the envisioned application and with the given
constraints. Several approaches have been presented in the literature, with different
performance, cost, power consumption, and accuracy.
Several proposals were made in the literature by using analog hardware (e.g., [15-19]).
Analog integrated circuits for neural computation are based on the fundamental laws of
electric circuits: the Kirchhoff's and Ohm's laws. According to the Ohm's law, the voltage
across an electric dipole is proportional to the current flowing through it. A linear dipole
can represent a neural synapses: the voltage across the dipole represents a neuron input and
the proportionality constant the related interconnection weight; the current flowing through
the dipole is the weighted input. According to the Kirchhoff's current law, the total current
entering a circuit node is null (currents exiting the node are accounted as negative terms). If
the negative poles of the dipoles associated to a neuron are grounded together, the weighted
summation of the neuron's inputs is the total current flowing to the ground. Similar results
can be achieved by using other circuit topologies and devices (e.g., operational amplifiers
and transistors). The use of analog circuit for neural computation is very effective since
24
computation is performed at a very high speed (i.e., the speed allowed by the propagation
and stabilization of the electric signals), the dimension of the circuit is very small, and all
neural signals are represented by continuous values (thus allowing for theoretically
representing very accurate values). However, there are two main drawbacks that greatly
limit the practical usability of this approach. First, the configuration of the neural system is
fixed at production time; consequently, the interconnection weights cannot be changed at
power up and a specific circuit needs to be fabricated for each application case. Second,
fabrication inaccuracies that are typical of any production process make impossible to
guarantee a good accuracy of the characteristic parameters of the devices and, consequently,
the accuracy of the neural interconnection weights. This approach should be adopted only if
the overall network behavior is highly robust with respect to the variation of the network
parameters.
Analog hardware with digital weights can be adopted to achieve some configurability of
the interconnection weight (e.g., [20,21]). In this case a mixed-mode multiplier computes
the input weighting. The multiplier (i.e., the weight) is given in the binary representation.
Multiplication is performed in parallel on each multiplier digit by using dedicated
circuitries; the analog multiplicand is presented in parallel to each of these single-digit
multipliers. Each binary digit of the multiplier controls the flow of the current through the
corresponding single-digit multiplier: no current will be generated if the control digit is
zero; otherwise a current proportional to the binary weight of the digit is generated. The
multiplication result is obtained by adding all currents generated, by the single-digit
multipliers according to the Kirchhoff' s current law. Performance of this approach is still
very high and control of the accuracy of characteristic device parameters is limited.
Interconnection weights are discretized since they are given in the binary representation;
this influences the accuracy of the final outputs. The network dimensions and topology as
well as the neuron's operation are fixed at production time, thus limiting the circuit
flexibility. The circuit size is larger than the pure analog approach since the mixed-mode
multipliers are more complex.
Complete control of the accuracy can be achieved by adopting digital dedicated
hardware architectures (e.g., [22-26]): all data are discretized and given in binary
representation and all operations are performed digitally. Interconnection weights are
configurable, but the network topology and size as well as the neuron's behavior are still
fixed at production time. Performance is much lower than in the corresponding analog
implementations due to the nature and the realization of the digital operations, but still it is
rather high. The circuit complexity becomes relevant and, consequently, the integrated
circuit becomes rather large. To limit the size and allow for fabrication, several neural
operators often share in time some components, by introducing suited registers and
clocking schemes; for example, one digital multiplier can be multiplexed among all
interconnection weights of a neuron or the same circuit can compute the operations of
several neurons sequentially. These architectures may have a limited circuit complexity for
some classes of neural networks, e.g., when the neuron output is a single-digit binary value.
The data discretization limits the accuracy, although it is exactly predictable.
The use of configurable digital hardware allows for high configurability (e.g., [27-29]).
The typical approach consists of implementing the neural networks on an FPGA: all
operations are mapped onto the logic blocks and interconnection paths of the FPGA. The
high-level description of the neural operation (e.g., written in C, SystemC, or VHDL
languages) is translated into the corresponding FPGA configuration that will be loaded on
memory-based architectures or will be used to set the operations and interconnections in
fuse-based architectures. Any neural topology and size and any neuron operation can be
accommodated in the FPGA, provided that sufficient logic blocks and interconnections are
25
available and that an appropriate operation schedule is adopted. Performance is lower than
the dedicated digital architecture since basic neural operations involve more and slower
physical components. Accuracy is influenced by the discretized operands.
Programmable digital architectures provide the highest configurability since the neural
operations are described in suited programs. Since the computation is known, the accuracy
can be evaluated; also in this case accuracy is influenced by the discretized operands.
Neurocomputers were developed to perform the neural computation in an efficient way
by preserving the system flexibility (e.g., [30-32]). The behavior of these architectures is
similar to the one of a conventional computer: the architecture consists of a memory in
which the sequences of specialized operations that describe the neural computation are
stored, and processing units that are able to fetch, decode, and execute these sequences
stored in the memory. To achieve high performance these architectures make use of
dedicated functional units to execute the operations that are the most frequent in the neural
computations, and efficient interconnection structures to distribute the neurons' outputs to
the receiving neurons. The specialized functional units may be implemented in FPGA to
ensure additional flexibility. Any neural network can therefore be implemented by this kind
of architectures, provided that the instructions executable by the processing units are able to
describe the desired neural behavior.
All of the above solutions suffer from the same problem: the more the architecture is
dedicated, the more expensive it becomes since it cannot be mass-produced and reused in a
large number of instances and different applications. To overcome this drawback, nonspecialized processors should be adopted so that they can be directly purchased on the
market as components off the shelf.
In this perspective, digital signal processors (DSP) are an attractive solution that
combines reasonably high performance with programmability (e.g., [33-35]). These
processors have an architecture that usually includes supports and functional units
specialized for the most frequent signal processing operations, e.g., convolution and
correlation. Since the weighted summation coincides with these operations, it can be
efficiently executed on DSP processors available on the market. The neural computation is
obtained by executing dedicated software written for the selected DSP processor. This
approach needs anyway to use processors, boards, software development environments, and
programming skills that are less available - and thus more expensive - than for the widelyused general-purpose processing architectures.
General-purpose processors are the most flexible computing structures for which many
programmers have sufficient knowledge and expertise to produce good programs.
Processors for personal computers are among these structures. For these architectures
dedicated software can be written in high-level programming languages to perform any
neural computation. Performance is lower than in DSP architectures with similar
characteristics since the efficient dedicated supports for DSP operations are not available in
general-purpose systems. To speed up the performance general-purpose supercomputers can
be used, e.g., [36-38].
To reduce the development costs due to the need of experienced programmers and to
widen the use of neural computation also among practitioners with limited programming
experience, general-purpose architectures with configurable software simulators can be
adopted (e.g., [39]). In these software simulators, through a graphical interface, the designer
can build the neural paradigm to tackle his application; typically he can select - in a
predefined but usually very large set - the desired family of neural networks, the specific
network dimension, and the appropriate weight configuration. In some simulators the
designer is even allowed for creating his own network model. Performance is usually
limited since configurability is obtained by interpreting the neural computation, thus
26
leading to a slow execution. Some of these simulators are however able to produce a
compiled version of the neural computation so as to greatly speed it up with respect to the
interpreted version.
Dedicated software or neural network simulators are also needed to support learning. In
any of these cases the network model adopted for learning must be identical to the one that
will be used in the operating life. In particular, great care is necessary in verifying that all
network characteristics, the precision of the data representation, the accuracy both of each
operation and of the sequences of operations, all data uncertainties are identical in order to
guarantee that the learnt behavior coincides with the one shown during the operational life
of the neural network.
27
28
model, the parameters are identified on the available data by statistic techniques. At the end
of the paradigm synthesis, all components are described by algorithms.
The fourth design phase is the hardware/software partitioning that splits the algorithmic
specification of the system into components to be implemented in dedicated analog, digital,
or mixed hardware devices, in configurable hardware components, or in software programs
running on DSP or general-purpose processors. This can be obtained by using one of the
many hardware-software co-design techniques proposed in the literature and widely
available in commercial CAD tools. Partitioning is guided by the non-functional
specifications. It is worth noting that hardware/software partitioning is independent from
computational paradigm partitioning. At the end of this phase the processing system
architecture and the detailed structure of each component are obtained.
The fifth design phase is the synthesis of the processing architecture. This can be
achieved by means of the traditional techniques for system synthesis: programming of the
software components and digital/analog synthesis of the hardware devices (e.g., [42]).
3.3. Application of neural techniques for intelligent sensors and measurement systems
Neural techniques were shown effective and efficient in enhancing the characteristics of
sensors and measurement systems as well as in industrial applications. In the literature many
perspectives were presented to introduce "intelligence" in these systems by means of neural
networks:
- sensor enhancement allows for creating devices which are able to physically sense
quantities for advanced applications,
- sensor linearization simplifies the use of sensors in measurement systems and
applications by providing an idealized view of the sensor,
- sensor fusion merges information from several sensors, possibly of different type, to
create new combined measurements,
sensor diagnosis verifies the correct operation of the sensor and detect the possible
presence of errors due to faults,
virtual sensors indirectly observe quantities for which no specific sensor is available by
using information about quantities related to the desired one,
- remote sensing allows for indirectly measuring physical quantities without using a
sensor that physically enters in contact with the measurand quantity,
high-level sensors measure abstract quantities (i.e., not directly related to physical
quantities) which are of interest for the applications,
- distributed intelligent sensing systems create a cooperative collection of sensors that
provides a comprehensive view of the system under measure,
- calibration allows for correctly relating the measured values performed by sensors and
measurement systems to the physical values of the quantities under measurement.
3.3.1 Sensor enhancement
The physical sensing materials have usually complex non-linear behaviors that need to be
related to the corresponding values of the measured quantities. In particular, some physical
characteristics of the sensing material when operating in physical contact with the
measurand quantity vary according to the physical laws that regulate the interaction
between the system under measurement and the measurement system. The varying physical
quantity of the sensing material that best represent the quantity under measurement is
assumed as the output of the sensor: this value is associated to the measurand quantity.
29
Neural networks can be used to create advanced sensors by suited processing the physical
outputs of the sensing materials to extract the measurement of the desired physical quantity,
especially when conventional processing techniques have been shown inaccurate or with
not sufficient adaptivity. In some cases, neural approaches are also useful to enhance the
accuracy of the measurement procedure so as to enhance the quality of the delivered
measurements. Among sensors that benefit from neural technologies the literature reports
sensors that reproduce the five human senses, as well as sensors for many other
environmental and industrial quantities like mechanical quantities (e.g., distance, force,
pressure), thermical quantities (e.g., temperature), and chemical quantities (e.g.,
concentration, presence of substances).
Image sensors are the basic step to reproduce the natural sight. Conventonal digital
cameras have image sensors, composed by a grid of sensible materials, that are able to
capture the light (intensity and color); in each pixel information is transformed into a digital
representation. Advanced image sensors mimic the behavior of the natural photoreceptors
(the elementary components of the retina) in the human eyes, to allow for capturing images
in a more "intelligent" and flexible way [4345]. Human photoreceptors in fact have selfadaptive abilities to deal with light intensity and color saturation in order to create highquality images even in the presence of adverse environmental conditions. Besides the image
characteristics are represented in an impulsive way for subsequent processing by the brain.
The artificial photodetector is obtained by using groups of photodiodes, which are sensible
to various wavelengths, and interconnection circuits, that provide lateral connections and
information processing among neighboring cells. When the photodetector is hit by light
within its sensitivity range, it generate impulses proportional to the light intensity; impulses
are then filtered by taking into account the events occurred in time and in the areas nearby.
Neural networks are used to implement the non-linear lateral cooperation. This approach
may have several benefits, including less saturation, reduced calibration, higher quality,
higher accuracy, higher time sensitivity, and less power consumption.
By using either the intelligent photodetectors or conventional cameras with suited postprocessing, an artificial retina can be implemented, whose behavior is similar to the human
one [46-48], to provide a prosthesis to overcome blindness and severe visual imparities
when the optical nerves and the optical brain functions are still in good conditions and
operational. At the moment, the complexity of data processing required to generate
appropriate signals for the brain is too high to be compacted into a small integrated circuit;
besides, power consumption and power supply are still a relevant problem that needs
external batteries and frequent recharges. These constraints prevent - nowadays - to realize
prosthesis for permanent implantation in the human body instead of the natural retina.
However, the feasibility of the approach was demonstrated by using stimulating devices
implanted on the optical nerves and a processing system out of the human body: a prototype
system was even recently implanted on a patient with interesting - although still low
quality - results. The image taken from the image sensor array is transformed into an
impulse-based representation suited for stimulating the optical nerves, also by using a
neural-based approach. The image representation is then coded and transmitted wireless to
the receiver implanted in the human body. Received data are decoded and delivered to the
optical nerve stimulators. The image is thus transferred from the artificial eye to the brain
for the usual processing and understanding.
At a higher abstraction level, visual sensors analyze an image or a sequence of images to
detect and understand the objects contained in the images themselves and, eventually, to
observe objects' motion [4953]. This function mimics the image understanding activity of
the natural brain. Objects are identified by extracting characteristic features from the image
and by comparing the combinations these features with those of the classes of objects to be
30
recognized: an object is identified when its features are similar to those of one of such
classes. Motion is detected and analyzed by observing the variations of the features in the
images of the sequence. Neural networks were shown effective for these adaptive tasks,
which have many practical applications not only in the medical field, but mainly several
industrial and robotics areas whenever image analysis and understanding is important.
Hearing sensor and the artificial cochlea can be realized similarly to the sight aids in
order to assist people with severe hearing imparities with adaptive personalized prosthesis
[54,55]. Conventional hearing aids increase the volume of any acoustic signal (voice,
sounds, noise), possibly by filtering out some frequency bands; in general this approach has
limited adaptivity to the patient and deliver too much noise that makes the patient
uncomfortable. A neural-based approach can outperform the conventional one for the voice:
it can understand the speech and synthesize the voice from the basic phonemes. First a
microphone captures the voice; then signal (also neural) processing detects the boundaries
between words, extracts the phonemes of each word, and identifies the words possibly by
using also a vocabulary. The coded words are then wireless transmitted to the implant in the
human body, where they are decoded and used to drive the voice reconstruction by
cascading the appropriate phonemes. This signal is used to stimulate the auditory nerve as
the natural cochlea does.
Odor sensors and the artificial nose were also successfully experimented by using neural
solutions [56-59]. The natural nose identifies odors by detecting the presence and the
quantity of chemicals in the air. It has receptors that are sensible to some specific classes of
chemicals; the brain merges all olfactory information and classifies the odor on the basis of
its experience and its knowledge of objects' smell. In the artificial nose, sensing materials
react to the presence of some chemical families, possibly different with those of the natural
receptors: these reactions are transduced into electric signals. On the basis of the type of
active sensing materials (i.e., the detected family of chemicals) and the amount of their
activity, the artificial nose classifies the smelled odor. This system was conceived for
automatic odor analysis in industrial applications, e.g., in alimentary factories to identify
rotting food or to grade the maturation level. Effectiveness of the neural approach is due to
the relevant problem non-linearities and the difficulties to give an algorithmic approach.
Similarly, the natural tongue identifies tastes by analyzing the presence and the quantity
of chemicals on the object touched by the tongue (the saliva transports the chemicals from
the surface of the tasted object to the papillae). In the artificial tongue [57], sensing
materials are used to detect some chemical families that are on the surface of the touched
objects, as in the artificial nose. Classification leads to identify the taste of the object
according to the kind of tastes to which the sensing materials are reacting and to the
knowledge used to configure the classifier. Also the artificial tongue is used to mimic the
human counterpart, e.g., in alimentary factory to automatically discriminate different types
and mix of foods and beverages. It is worth nothing that artificial nose and tongue are based
on the same operating principles: the only difference is how the chemicals are brought to
the sensing devices (thorough the air in the nose, by contact or through water in the tongue).
Tactile sensors are important for advanced robotics when robotic hands need to take
objects carefully (e.g., delicate, deformable, or slippery objects) or when objects must be
tactilely recognized. The natural skin contains an array of tactile sensors that are able to
observe the tridimensional field of mechanical force due to gripping an object; from the
analysis of the field of mechanical force, the brain is able to recognize the shape of the
touched surface by comparing the current one to its knowledge. The artificial tactile sensors
reproduce the ability of neurally reconstructing the field of forces from the individual
information coming from the sensing units: from the analysis of the field of mechanical
31
force they are able to classify the surface shape, to identify the surface state, and to predict
the slippery of the grip [60-63].
By using neural techniques several other advanced sensors were developed to measure
mechanical quantities, very well suited for industrial applications. In pressure sensors [64]
the neural networks were used to correlate the strongly non-linear output of a barometric
cell to the corresponding pressure value, by incorporating the specific characteristics of the
cell that vary from one cell to another due to the inaccuracies of the production process.
Adaptive distance sensors can be implemented by adopting a sonar- or laser- based system
[65,66]. Surface roughness can be deduced by analyzing and intelligent merging several of
these measurements taken at a short distance [67]. Velocity and angular velocity can be
measured by adaptive analysis of the position of the envisioned object [68,69]. Other
quantities reported in the litertaure concern, among the many examples, force [70,71],
torque [72,73], and strain [74].
The use of neural networks was proved effective also to implement many other sensors
and measurement systems for electromagnetic quantities (e.g., [75]), environmental
quantities (e.g., temperature and humidity [76-79]) and chemical and biological quantities
(e.g., [80-85]). All these cases have many practical implications, especially in a wide
variety of industrial production areas, in the biomedical fields, and in environmental
monitoring. The basic goals of the use of neural technologies are the same of the cases
presented above: to achieve a better evaluation of the system output, to achieve adaptability
of the measurement system, and to describe the desired behavior in an easier way by
examples.
5.3.2 Sensor linearization
Linearization of the physical output generated by a sensor is useful for many practical
applications in all areas. On the market many cheap sensors become nowadays available:
low cost makes them highly desirable to reduce the cost of products and systems. However,
these sensors often have non-linear output functions corresponding to linearly changing
values of the physical measurand quantity. In these cases the subsequent data processing
has to deal with such non-linearity to produce the measured value. For example,
thermocouples typically measure the temperature by producing an electric voltage at their
outputs; this voltage is non-linearly related to the actual temperature. The temperature value
needs to be deduced from a conversion table or function; since the correspondence between
the sensor output and the temperature is often quite difficult to be given as a simple
function, a look-up table can be adopted. Performing this conversion with the required
accuracy a lot of efforts is required either for the possible computational complexity of the
complex conversion function or for the large size of the look-up table.
If the sensor output was linear, the conversion will be much easier since it will be a
simple multiplication for a constant gain, typical of that sensor. The applications could thus
be written in a much simpler way, especially when system control is envisioned.
Unfortunately, the reality cannot be changed to make it ideal. However, it is possible to
save the simplified view of a non-linear sensor for the application designers by linearizing
the output of the sensor itself, i.e., linearization can be embedded in the sensing system so
that non-linearities will remain hidden in it. This will not remove the computational or
memory efforts mentioned above since they will remain hidden in the measurement system,
but it will allow for a much simpler use of the sensor in the various applications since the
non-linear sensor and the linearization procedure will constitute a single system.
Linearization can be pursued by several techniques. As already said, the look-up table is
easy to create although may become expensive in terms of memory usage. To save memory
32
33
limiting the dominance of each sensor onto the comprehensive view and, consequently,
avoiding polarizations and enhancing the overall quality.
Sensor fusion for data integration can be implemented by means of a single merging
procedure that computes all refined and combined outputs depicting the comprehensive
view. This is efficient when the interdependencies among the measured quantities are
numerous and each of them involves most of the measured quantities. Alternatively,
individual merging procedures can be adopted to refine each measurement by taking into
account the information provided by the other sensors. This is suited when
interdependencies involve a limited number of different physical quantities for each
measurement to be refined.
3.3.4 Sensor diagnosis
Various causes (e.g., aging) may lead to measure drifts in sensors. Sensor fusion, e.g., by
neural networks and the continuous comparison among the samples taken by various
identical sensors about at the same time can be used to early detect this phenomenon and,
eventually, to mask its effects by correcting - as much as possible - the wrong measures
before recalibrating the drifted sensor itself [9398]. The correct measure is the one on
which most of the sensors agree, within a given tolerance range of values (this interval is
due to the uncertainty in the measurement: comparison needs to be considered positive not
only if the measured values are identical, but also if their uncertainty intervals overlap).
When a sensor is identified as not sufficiently "reliable" due to drifting, its measurements
can be ignored and decisions about the subsequent measured values taken only on the basis
of the responses of the remaining reliable sensors. This is especially useful when
maintenance and recalibration are difficult, or expensive, or even impossible, or cannot be
performed too frequently.
Similarly, normal wear and accidents in the operating environment may induce faults
into a sensor, i.e., may change the physical structure either of the whole sensor or one or
more of its parts. Some of these faults do not affect the normal operation of the sensor,
which continues to deliver correct outputs, i.e., to produce measures that coincide with the
actual value of the quantity under measurement. Other faults may appear as erroneous
values delivered by the sensor, i.e., different from the outputs that such a sensor would have
delivered in the absence of the fault. Sensor fusion can be adopted to support various
strategies for fault tolerance. First of all, it can be used for sensor error detection.
Comparison of the sensors outputs points out the presence of erroneous measurements and,
thus, of faulty sensors: an error is detected whenever the compared sensors outputs are
different and exceed the intrinsic tolerance due to the measurement uncertainty. The faulty
sensor is the one that disagrees from the value delivered by the other sensor. Error
correction can be realized by majority voting the sensor outputs (an odd number of sensors
is required): the output value - including the tolerance of the measurement uncertainty- on
which there is the higher consensus is assumed as the correct value by masking the actual
presence of the error. To preserve the detection and correction abilities as much as possible,
fault insulation must be applied, by removing the faulty sensor from the active operation
(i.e., by ignoring its outputs). It is worth nothing that these abilities are somewhat reduced
when a sensor becomes faulty and is insulated since its contribution to the comparisons is
now missing. Repair allows for recovering the sensor in the normal operation and for
restoring the full fault tolerance abilities.
Different kind of sensors to measure the same physical quantity may enhance the fault
tolerance. Different types of sensors will have different wear, or will be subject to different
34
aging mechanisms, or will have different faults and errors. Diversity minimizes the
probability that sensors are progressively changing in a similar way about at the same time.
Sensors for different physical quantities of the same physical region can be adopted also
to overcome possible drifting or temporary errors in measurements due to local transient
events in sensors that have no specific relevance for the observation of the whole system
[93,95,97,99,100]. Information produced by a sensor, which are not consistent with the
whole picture of the system as created by the other sensors, can be identified as erroneous
and, thus, ignored.
Instead of relying on comparison among real data, diagnosis can be also performed by
adopting a model-based approach [97,101,102]. A model of the sensor is created by means
of system identification techniques (e.g., by using neural models). The model is exploited to
predict the expected sensor output from the sensor past outputs without any physical
redundancy of the sensor: if the expected output differs too much from the actual one, an
error is detected.
3.3.5 Virtual sensors and remote sensing
A real sensor can be used when there are sensing materials and techniques that allow for
observing the desired quantity and when this sensor can be placed in the desired location of
the system to perform the measurement. In some cases, this is not feasible since direct
sensing of the desired physical quantity is not technically viable, or is not convenient for the
application, or can be dangerous for the system and operator safety, or is economically
expensive.
In some cases, although the desired quantity is difficult to be directly measured, other
quantities strictly related to it can be observed more easily. An indirect measurement
procedure can thus be created. Sensors are placed in the system (where feasible,
appropriate, or convenient) to observe the quantities that can be directly measured. The
laws that describe the relationships among the quantities measured by these sensors and
with the desired quantity are identified: they can involve mechanical physics, chemistry,
optics, electromagnetism, etc. From these laws it is possible to extract a function that gives
the indirect measurement of the desired quantity from the values of the directly measured
quantities. The sensors for direct measurements and this function constitute a virtual sensor.
It is virtual since it is not a physically existing sensor to directly observe the desired
quantity.
Since neural networks can be widely used as function approximators, they are also
effective as data processing tools to merge the values coming from the physical sensors
according to the merging function that computes the indirect measurement by applying the
relationships among the measured quantities (e.g., [103,104]).
A special case of virtual sensor is when the quantity to be measured is in a location too
far from the measuring system. In other cases the quantity to be measured involves a wide
region of space and would require a too high number of sensors or an iterative sensing in
the whole region, while the desired quantity is only a concise information (e.g., average or
total value). In both of these types of cases only an indirect measurement technique is
appropriate: in the literature this approach is known as remote sensing.
Examples of these measurements taken from satellites encompass the Earth surface
parameters (e.g., the canopy temperature, the soil temperature, the canopy water content,
and the soil moisture content), the rainfall, the snowfall, the air pollution, the CO emission,
the ozone hole, etc. Also in these applications the neural networks proved their
effectiveness and - sometimes - their superiority in merging information and extracting
concise views [105109].
35
36
(e.g., mobile software agents able to travel in the computer network, or mobile robots). An
agency is a program running on a computer or a computer network to support agent
cooperation. A perceptive agency is an agency in which the agents cooperate to perform
measurements and monitor the desired system. Differently from distributed measurement
systems, in a perceptive agency the components (i.e., the agents) do not know each other in
advance: each declares which are its features and cooperation is dynamically built through
an interactive agreement process. Such an approach allows - in particular - for high
modularity, scalability, fault tolerance, and adaptability.
In all of the distributed architectures mentioned above for the sensing and measurement
system, the neural networks can play different relevant roles: they can be used to enhance
the individual sensors, to merge multisensor information as virtual sensors, to support
adaptive remote sensing, and to create high-level sensors based on distributed information.
In summary they can be used to introduce flexibility and adaptability in the distributed
sensing and measurement systems, making more "intelligent" these procedure.
3.3.8 Calibration
Calibration [118] is the operation that establish, under given conditions, the relationship
between values produced by a sensor or an instrument and the known values of the
measurand. In the practice, similarly to sensor linearization, this operation consists in
identifying a relationship to convert the physical sensor output into an ideal sensor output.
The ideal (although not necessarily linear) description of the sensor behavior is appreciated
in the applications to specify the desired behavior on the basis of ideal reference sensors so
as to avoid to know the actual details of the specific sensor that has been installed in the
system.
Implementation of this conversion relationship may consist either in a look-up table or
in a function. As in sensor linearization, the use of a function allows for saving a large
amount of memory space. To identify the function global interpolation techniques (e.g.,
Newton's or Lagrange's interpolation) can be adopted: they compute the polynomial - of a
given order - that passes through all calibration samples; coefficients are computed by
looking to the whole interval in which the function has to be defined. Local interpolation
techniques (e.g., splines) look for the polynomial (of a given order) that passes through the
samples contained in small windows (only few samples long) of the whole function
codomain. Regression techniques (e.g., least mean squares) looks for functions that
approximates the samples, without necessarily passing through them, by minimizing the
global approximation error.
Feed-forward neural networks (as universal function approximators) are another
regression-type technique that can be effectively used to approximate the desired function
described by the sampled calibration data. In several cases neural networks have shown a
higher approximation ability, accuracy, robustness, and generalization ability than
conventional regression techniques at a similar or mildly higher computational complexity
both for static and dynamic calibration (e.g., [119-122]). High generalization ability is
highly appreciated in calibration since it allows to achieve the same calibration quality with
a smaller number of samples. Conventional regression techniques need to know the
maximum order of the polynomial to be used for approximation: neural networks are
autonomously able to find the best approximation for the given network dimension.
The sensor fusion ability of neural networks can also be exploited to easier calibrate
sensors in which the operating conditions depend from other parameters (e.g., the
temperature for a high-accuracy pressure sensor [122]) as well as to calibrate multisensor
systems (e.g., [123]).
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
f 11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
L. Cristaldi, A. Ferrero, and V. Piuri, "Programmable instruments, virtual instruments, and distributed
measurement systems: what is really useful, innovative and technically sound?," IEEE Instrumentation
& Measurement Mag., vol. 2, pp. 2027, Sept. 1999.
J. Hertz, A. Krogh, and R. G. Palmer, An Introduction to the Theory of Neural Computation. Lecture
Notes Volume I, Addison Wesley, 1991.
E. Sinencio-Sanchez and C. Lau, Artificial Neural Networks: Paradigms, Applications, and Hardware
Implementations. IEEE Press, Dec. 1992.
J. Zurada, Introduction to Artificial Neural Systems. St. Paul: West Publishing Company, 1992.
L. Fausett, Fundamentals of Neural Networks. Prentice Hall, Englewood Cliffs, 1994.
S. Haykin, Neural networks: a comprehensive foundation. New Jersey, USA: Prentice Hall, 1999.
T. Kohonen, Self-Organizing Maps, vol. 30 of Springer Series in Information Sciences. Berlin,
Heidelberg, New York: Springer, 3 ed., 2001.
C. Alippi, A. Ferrero, and V. Piuri, "Artificial intelligence for instruments and measurement
applications," IEEE Instrumentation & Measurement Mag., vol. 1, pp. 9-17, June 1998.
C. Alippi, S. Ferrari, V. Piuri, M. Sami, and F. Scotti, "New trends in intelligent systems design for
embedded and measurement application," IEEE Instrumentation & Measurement Mag., vol. 2, pp. 3644, June 1999.
C. Alippi, V. Piuri, and F. Scotti, "Accuracy versus complexity in RBF neural networks," IEEE
Instrumentation cfe Measurement Mag., vol. 4, pp. 32-36, Mar. 2001.
A. Weigend and D. Rumelhart, 'The effective dimension of the space of hidden units," in Proc. IEEE
Int. Joint Conf. on Neural Networks, 1991, vol. 3, pp. 2069-2074, 1991.
C. Alippi, R. Petracca, and V. Piuri, "Off-line performance maximization in feedforward neural
networks by applying virtual neurons and covariance transformations," in Proc. IEEE Int. Symp. on
Circuits and Systems, 1995, pp. III.2197-III.2200, Apr. 1995.
N. Murata, S. Yoshizawa, and S. Arnari, "Network information criteriondetermining the number of
hidden units for an artificial neural network model," IEEE Trans, on Neural Networks, vol. 5, pp. 865872, Nov. 1994.
K. Fukunaga, Introduction to statistical pattern recognition. New York: Academic Press, 1972.
Y. Ota and B. Wilamowski, "Analog implementation of pulse-coupled neural networks," IEEE Trans.
on Neural Networks, vol. 10, pp. 539-544, May 1999.
H. Abdelbaki, E. Gelenbe, and S. El-Khamy, "Analog hardware implementation of the random neural
network model," in Proc. IEEE-INNS-ENNS Int. Joint Conf. on Neural Networks, 2000, pp. 197-201,
2000.
G. Indiveri, "A neuromorphic VLSI device for implementing 2D selective attention systems," IEEE
Trans, on Neural Networks, vol. 12, pp. 1455-1463, Nov. 2001.
C. Lu, B. Shi, and L. Chen, "Hardware implementation of an on-chip BP learning neural network with
programmable neuron characteristics and learning rate adaptation," in Proc. Int. Joint Conf. on Neural
Networks, 2001, pp. 212-215, 2001.
A. OOrenci,G. Dundar, and S. Balkir, "Fault-tolerant training of neural networks in the presence of
MOS transistor mismatches," IEEE Trans, on Circuits and Systems II: Analog and Digital Signal
Processing, vol. 48, pp. 272-281, Mar. 2001.
A. Heittmann and U. Ruckert, "Mixed mode VLSI implementation of a neural associative memory," in
Proc. Seventh Int. Conf. on Microelectronics for Neural, Fuzzy and Bio-Inspired Systems, 1999, pp.
299-306, 1999.
K. Waheed and F. Salam, "A mixed mode self-programming neural system-on-chip for real-time
applications," in Proc. Int. Joint Conf. on Neural Networks, 2001, vol. 1, pp. 195-200, 2001.
S. Kung, "Tutorial: digital neurocomputing for signal/image processing," in Proc. 1991 IEEE
Workshop Neural Networks for Signal Processing, pp. 616-644, 1991.
M. Yasunaga, N. Masuda, M. Yagyu, M. Asai, K. Shibata, M. Ooyama, M. Yamada, T. Sakaguchi,
and M. Hashimoto, "A self-learning digital neural network using wafer-scale LSI," IEEE J. of SolidState Circuits, vol. 28, pp. 106114, Feb. 1993.
C. Chin Wang, C. Jung Huang, and Y. Pei Chen, "Design of an innerproduct processor for hardware
realization of multi-valued exponential bidirectional associative memory," IEEE Trans, on Circuits
and Systems II: Analog and Digital Signal Processing, vol. 47, pp. 1271-1278, Nov. 2000.
R. Perfetti and G. Costantini, "Multiplierless digital learning algorithm for cellular neural networks,"
IEEE Trans, on Circuits and Systems I: Fundamental Theory and Applications, vol. 48, pp. 630635,
May 2001.
38
[26] T. Schoenauer, S. Atasoy, N. Mehrtash, and H. Klar, "NeuroPipe-chip: A digital neuro-processor for
spiking neural networks," IEEE Trans, on Neural Networks, vol. 13, pp. 205213, Jan. 2002.
[27] M. Arroyo Leon, A. Ruiz Castro, and R. Leal Ascencio, "An artificial neural network on a field
programmable gate array as a virtual sensor," in Proc. Third Int. Workshop on Design of Mixed-Mode
Integrated Circuits and Applications, 1999, pp. 114117, 1999.
[28] G. Frank, G. Hartmann, A. Jahnke, and M. Schafer, "An accelerator for neural networks with pulsecoded model neurons," IEEE Trans, on Neural Networks, vol. 10, pp. 527-538, May 1999.
[29] C.-M. Kim and S. Y. Lee, "A digital chip for robust speech recognition in noisy environment," in
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2001, vol. 2, pp. 1089-1092,
2001.
[30] Intel Corp., Santa Clara, CA, 807 70NX ETANN, Data Sheet, 1991.
[31] Y. Sato, K. Shibata, M. Asai, M. Ohki, M. Sugie, T. Sakaguchi, M. Hashimoto, and Y. Kuwabara,
"Development of a high-performance general purpose neuro-computer composed of 512 digital
neurons," in Proc. Int. Joint Conf. on Neural Networks, 1993, vol. 2, pp. 1967-1970, 1993.
[32] U. Ramacher, W. Raab, J. Hachmann, J. Beichter, N. Bruls, M.Wesseling, E. Sicheneder, J. Glass, A.
Wurz, and R. Manner, "SYNAPSE-1: a highspeed general purpose parallel neurocomputer system," in
Proc. 9th Int. Parallel Processing Symp., 1995, pp. 774-781, 1995.
[33] U. Muller, A. Gunzinger, and W. Guggenbuhl, "Fast neural net simulation with a DSP processor
array," IEEE Trans, on Neural Networks, vol. 6, pp. 203-213, Jan. 1995.
[34] M. Murakawa, S. Yoshizawa, I. Kajitani, X. Yao, N. Kajihara, M. Iwata, and T. Higuchi, "The GRD
chip: genetic reconfiguration of DSPs for neural network processing," IEEE Trans, on Computers, vol.
48,pp. 628639, June 1999.
[35] V. Cantoni and A. Petrosino, "Neural recognition in a pyramidal structure," IEEE Trans, on Neural
Networks, vol. 13, pp. 472480, Mar. 2002.
[36] E. Kerckhoffs, F. Wedman, and E. Frietman, "Speeding up backpropagation training on a hypercube
computer," J. ofNeurocomputing, vol. 4, pp. 4363, 1992.
[37] M. Kuga, Y. Namiuchi, B. Apduhan, and T. Sueyoshi, "Implementation and performance evaluation
of a neural network simulator on highly parallel computer AP-1000," in Proc. 1993 Int. Conf. on
Parallel And Distributed Systems, pp. 722-726, July 1993.
[38] X. Liu and G. Wilcox, "Benchmarking of the CM-5 and the Cray machines with a very large
backpropagation neural network," in IEEE Int. Conf. on Neural Networks, 1994, vol. 1, pp. 22-27,
1994.
[39] H. Demuth and M. Beale, Neural Network Toolbox Users Guide. The MathWorks, Inc., 7 ed.. Mar.
2001.
[40] G. D. Micheli and M. Sami, eds., Hardware/Software CoDesign, vol. 310 of NATO ASI Series. Kluwer
Academic Publishers, 1995.
[41] D. D. Gajski, F. Vahid, S. Narayan, and J. Gong, Specification and Design of Embedded Systems.
Englewood Cliffs, New Jersey 07632: Prentice Hall, 1994.
[42] Ptolemy - http://ptolemy.eecs.berkeley.edu/
[43] C. Nilson, R. Darling, and R. Pinter, "Shunting neural network photodetector arrays in analog CMOS,"
1EEEJ. of Solid-State Circuits, vol. 29, pp. 1291-1296, Oct. 1994.
[44] T. Yagi, Y. Hayashida, and S. Kameda, "An analog VLSI which emulates biological vision," in Proc.
Second Int. Conf. on Knowledge-Based Intelligent Electronic Systems, 1998, vol. 3, pp. 454460,
1998.
[45] M. Wilcox and D. Thelen Jr., "A retina with parallel input and pulsed output, extracting highresolution information," IEEE Trans, on Neural Networks, vol. 10, pp. 574583, May 1999.
[46] M. Becker, R. Eckmiller, and R. Hunermann, "Psychophysical test of a tunable retina encoder for
retina implants," in Proc. Int. Joint Conf. on Neural Networks, 1999, vol. 1, pp. 192-195, 1999.
[47] W. Liu, K. Vichienchom, M. Clements, S. DeMarco, C. Hughes, E. McGucken, M. Humayun, E. D.
Juan, J. Weiland, and R. Greenberg, "A neuro-stimulus chip with telemetry unit for retinal prosthetic
device," IEEE J. of Solid-State Circuits, vol. 35, pp. 1487-1497, Oct. 2000.
[48] C.-Y. Wu, L.-J. Lin, and K.-H. Huang, "A new light-activated CMOS retinal-pulse generation circuit
without external power supply for artificial retinal prostheses," in The 8th IEEE Int. Conf. on
Electronics, Circuits and Systems, 2001, vol. 2, pp. 619622, 2001.
[49] S. Watanabe and M. Yoneyama, "An ultrasonic visual sensor for threedimensional object recognition
using neural networks," IEEE Trans, on Robotics and Automation, vol. 8, pp. 240-249, Apr. 1992.
[50] C.-F. Chiu and C.-Y.Wu, 'The design of rotation-invariant pattern recognition using the silicon
retina," IEEEJ. of Solid-State Circuits, vol. 32, pp. 526534, Apr. 1997.
39
[51] Z. Lu and B. Shi, "Subpixel resolution binocular visual tracking using analog VLSI vision sensors,"
IEEE Trans, on Circuits and Systems II: Analog and Digital Signal Processing, vol. 47, pp. 14681475, Dec. 2000.
[52] N. Goerke, R. Schatten, and R. Eckmiller, "Enhancing active vision by a neural movement predictor,"
in Proc. Int. Joint Conf. on Neural Networks, 2001, vol. 2, pp. 1312-1317, 2001.
[53] G. Foresti and S. Gentili, "A hierarchical classification system for object recognition in underwater
environments," IEEE J. of Oceanic Engineering, vol. 27, pp. 6678, Jan. 2002.
[54] M. Leisenberg, "Hearing aids for the profoundly deaf based on neural net speech processing," in Proc.
Int. Conf. on Acoustics, Speech, and Signal Processing, 1995, vol. 5, pp. 3535-3538, 1995.
[55] C.-H. Chang, G. Anderson, and P. Loizou, "A neural network model for optimizing vowel recognition
by cochlear implant listeners," IEEE Trans, on Neural Systems and Rehabilitation Engineering, vol. 9,
pp. 4248, Mar. 2001.
[56] C. Di Natale, A. Macagnano, R. Paolesse, E. Tarizzo, A. D'Amico, F. Davide, T. Boschi, M. Faccio,
G. Ferri, F. Sinesio, F. Bucarelli, E. Moneta, and G. Quaglia, "A comparison between an electronic
nose and human olfaction in a selected case study," in Proc. Int. Conf. on Solid State Sensors and
Actuators, 1997, vol. 2, pp. 1335-1338, 1997.
[57] P. Wide, F. Winquist, P. Bergsten, and E. Petriu, "The human-based multisensor fusion method for
artificial nose and tongue sensor data," IEEE Trans, on Instrumentation and Measurement, vol. 47, pp.
1072-1077, Oct. 1998.
[58] R. Dowdeswell and P. Payne, "Odour measurement using conducting polymer gas sensors and an
artificial neural network decision system," Engineering Science and Education Journal, vol. 8, pp.
129134, June 1999.
[59] E. Mines, E. Llobet, and J. Gardner, "Electronic noses: a review of signal processing techniques," IEE
Proc. Circuits, Devices and Systems, vol. 146, pp. 297-310, Dec. 1999.
[60] G. Canepa, M. Morabito, D. De Rossi, A. Caiti, and T. Parisini, "Shape from touch by a neural net," in
Proc. IEEE Int. Conf. on Robotics and Automation, 1992, vol. 3, pp. 2075-2080, 1992.
[61] W. McMath, M. Colven, S. Yeung, and E. Petriu, "Tactile pattern recognition using neural networks,"
in Proc. Int. Conf. on Industrial Electronics, Control, and Instrumentation, 1993, vol. 3, pp. 13911394,1993.
[62] A. Caiti, G. Canepa, D. De Rossi, F. Germagnoli, G. Magenes, and T. Parisini, "Towards the
realization of an artificial tactile system: fmeform discrimination by a tensorial tactile sensor array and
neural inversion algorithms," IEEE Trans, on Systems, Man and Cybernetics, vol. 25, pp. 933-946,
June 1995.
[63] G. Canepa, R. Petrigliano, M. Campanella, and D. D. Rossi, "Detection of incipient object slippage by
skin-like sensing and neural network processing," IEEE Trans, on Systems, Man and Cybernetics, Part
B, vol. 28, pp. 348-356, June 1998.
[64] J. Patra, A. Kot, and G. Panda, "An intelligent pressure sensor using neural networks," IEEE Trans, on
Instrumentation and Measurement, vol. 49, pp. 829-834, Aug. 2000.
[65] S. Aisawa, K. Noguchi, and T. Matsumoto, "Neural processing-type displacement sensor employing
multimode waveguide," IEEE Photonics Technology Letters, vol. 3, pp. 394-396, Apr. 1991.
[66] A. Carullo, F. Ferraris, S. Graziani, U. Grimaldi, and M. Parvis, "Ultrasonic distance sensor
improvement using a two-level neural-network," IEEE Trans, on Instrumentation and Measurement,
vol. 45, pp. 677682, Apr. 1996.
[67] K. Zhang, C. Butler, Q. Yang, and Y. Lu, "A fiber optic sensor for the measurement of surface
roughness and displacement using artificial neural networks," IEEE Trans, on Instrumentation and
Measurement, vol. 46, no. 4, pp. 899-902, 1997.
[68] J. Kramer, R. Sarpeshkar, and C. Koch, "Pulse-based analog VLSI velocity sensors," IEEE Trans, on
Circuits and Systems II: Analog and Digital Signal Processing, vol. 44, pp. 86-101, Feb. 1997.
[69] G. Brasseur, "Modeling of the front end of a new capacitive finger-type angular-position sensor,"
IEEE Trans, on Instrumentation and Measurement, vol. 50, pp. 111116, Feb. 2001.
[70] K.-J. Xu and C. Li, "Dynamic decoupling and compensating methods of multi-axis force sensors,"
IEEE Trans, on Instrumentation and Measurement, vol. 49, pp. 935-941, Oct. 2000.
[71] M. H. Choi and W. W. Lee, "A force/moment sensor for intuitive robot teaching application," in Proc.
IEEE Int. Conf. on Robotics and Automation, 2001, vol. 4, pp. 40114016, 2001.
[72] B. Fahimi, G. Suresh, and M. Ehsani, "Torque estimation in switched reluctance motor drive using
artificial neural networks," in Proc. 23rd Int. Conf. on Industrial Electronics, Control and
Instrumentation, 1997, vol. 1, pp. 21-26, 1997.
[73] F. Discenzo, F. Merat, D. Chung, and P. Unsworth, "Low-cost optical neural-net torque transducer," in
IEE Colloquium on Intelligent and Self-Validating Sensors (Ref. No. 1999/160), pp. 15/1-15/4, 1999.
40
[74] W. Bock, E. Porada, and M. Zaremba, "Neural processing-type fiberoptic strain sensor," IEEE Trans.
on Instrumentation and Measurement, vol. 41, pp. 1062-1066, Dec. 1992.
[75] J. Dias Pereira, O. Postolache, and P. Silva Girao, "A temperature compensated system for magnetic
field measurements based on artificial neural networks," IEEE Trans, on Instrumentation and
Measurement, vol. 47, pp. 494498, Apr. 1998.
[76] C. Chan, W. Jin, A. Rad, and M. Demokan, "Simultaneous measurement of temperature and strain: an
artificial neural network approach," IEEE Photonics Technology Letters, vol. 10, pp. 854856, June
1998.
[77] S.-L. Tsao, J. Wu, and B.-C. Yeh, "High-resolution neural temperature sensor using fiber Bragg
gratings," IEEEJ. of Quantum Electronics, vol. 35, pp. 1590-15%, Nov. 1999.
[78] A. Chatterjee, S. Munshi, M. Dutta, and A. Rakshit, "An artificial neural linearizer for capacitive
humidity sensor," in Proc. 17th IEEE Instrumentation and Measurement Technology Conf., 2000, vol.
1, pp. 313-317,2000.
[79] M. Dawson, A Fung, and M. Manry, "A robust statistical-based estimator for soil moisture retrieval
from radar measurements," IEEE Trans. on Geoscience and Remote Sensing, vol. 35, pp. 5767, Jan.
1997.
[80] H.-K. Hong, H. W. Shin, H. S. Park, D. H. Yun, C. H. Kwon, K. Lee, S.-T. Kim, and T. Moriizumi,
"Gas identification using oxide semiconductor gas sensor array and neural-network pattern
recognition," in The 8th Int. Conf. on Solid-State Sensors and Actuators, 1995 and Eurosensors IX,
vol. 1, pp. 687690, 1995.
[81] T. Lu and J. Lerner, "Spectroscopy and hybrid neural network analysis," Proc. IEEE, vol. 84, pp. 895
-905, June 1996.
[82] M. Giacomini, C. Ruggiero, S. Bertone, and L. Calegari, "Artificial neural network identification of
heterotrophic marine bacteria based on their fatty-acid composition," IEEE Trans, on Biomedical
Engineering, vol. 44, pp. 11851191, Dec. 1997.
[83] A. Pardo, S. Marco, and J. Samitier, "Nonlinear inverse dynamic models of gas sensing systems based
on chemical sensor arrays for quantitative measurements," IEEE Trans, on Instrumentation and
Measurement, vol. 47, pp. 644651, June 1998.
[84] S. Osowski and K. Brudzewski, "Hybrid neural network for gas analysis measuring system," in Proc.
16th IEEE Instrumentation and Measurement Technology Conf., 1999, vol. 1, pp. 440444, 1999.
[85] T. Sobanski, A. Szczurek, and B. Licznerski, "Application of sensor array and artificial neural network
for discrimination and qualification of benzene and ethylbenzene," in 24th Int. Spring Seminar on
Electronics Technology: Concurrent Engineering in Electronic Packaging, 2001, pp. 150153, 2001.
[86] M. Attari, F. Boudjema, and M. Heniche, "An artificial neural network to linearize a G (tungsten vs.
tungsten 26% rhenium) thermocouple characteristic in the range of zero to 2000C," in Proc. IEEE
Int. Symp. on Industrial Electronics, 1995, vol. 1, pp. 176-180, 1995.
[87] G. Dempsey, N. Alt, B. Olson, and J. Alig, "Control sensor linearization using a microcontroller-based
neural network," in Proc. IEEE Int. Conf. on Systems, Man, and Cybernetics, 1997, vol. 4, pp. 3078
3083, 1997.
[88] N. Medrano-Marques and B. Martin-del-Brio, "Sensor linearization with neural networks," IEEE
Trans, on Industrial Electronics, vol. 48, pp. 1288-1290, Dec. 2001.
[89] S. Ben-Yacoub, Y. Abdeljaoued, and E. Mayoraz, "Fusion of face and speech data for person identity
verification," IEEE Trans, on Neural Networks, vol. 10, pp. 1065-1074, Sept. 1999.
[90] A. Filippidis, L. Jain, and N. Martin, "Multisensor data fusion for surface land-mine detection," IEEE
Trans, on Systems, Man and Cybernetics, Pan C, vol. 30, pp. 145-150, Feb. 2000.
[91] Z. Zhang, S. Sun, and F. Zheng, "Image fusion based on median filters and SOFM neural networks: a
three-step scheme," Signal Processing, vol. 81, pp. 1325-1330, June 2001.
[92] Y. Xia, H. Leung, and E. Bosse, "Neural data fusion algorithms based on a linearly constrained least
square method," IEEE Trans, on Neural Networks, vol. 13, pp. 320-329, Mar. 2002.
[93] M. Napolitano, G. Silvestri, D. Windon II, J. Casanova, and M. Innocenti, "Sensor validation using
hardware-based on-line learning neural networks," IEEE Trans, on Aerospace and Electronic Systems,
vol. 34, pp. 45668, Apr. 1998.
[94] O. Postolache, P. Girao, H. Ramos, and J. Dias Pereira, "A temperature sensor fault detector as an
artificial neural network application," in Proc. MELECON 98, 9th Mediterranean Electrotechnical
Conf., 1998, vol. 1, pp. 678682, 1998.
[95] Y. Liu, Y. Shen, and H. Hu, "A new method for sensor fault detection, isolation and accommodation,"
in Proc. 16th IEEE Instrumentation and Measurement Technology Conf. 1999, vol. 1, pp. 488492,
1999.
[96] T. Long, E. Hanzevack, and W. Bynum, "Sensor fusion and failure detection using virtual sensors," in
Proc. 1999 American Control Conf., vol. 4, pp. 2417 -2421, 1999.
[97] G. Betta and A. Pietrosanto, "Instrument fault detection and isolation: state of the art and new research
trends," IEEE Trans, on Instrumentation and Measurement, vol. 49, pp. 100107, Feb. 2000.
[98] A. Sachenko, V. Kochan, V. Turchenko, V. Golovko, J. Savitsky, A. Dunets, and T. Laopoulos,
"Sensor errors prediction using neural networks," in Proc. IEEE-INNS-ENNS Int. Joint Conf. on
Neural Networks, 2000, vol. 4, pp. 441446, 2000.
[99] H. Jin, C. Chan, H. Zhang, and W. Yeung, "Fault detection of redundant systems based on B-spline
neural network," in Proc. 2000 American Control Conf., vol. 2, pp. 1215-1219, 2000.
[100] G. Yen and W. Feng, "Winner take all experts network for sensor validation," in Proc. 2000 IEEE Int.
Conf. on Control Applications, 2000, pp. 92-97, 2000.
[101] E. Eryurek and B. Upadhyaya, "Sensor validation for power plants using adaptive backpropagation
neural network," IEEE Trans, on Nuclear Science, vol. 37, pp. 10401047, Apr. 1990.
[102] S. Naidu, E. Zafiriou, and T. McAvoy, "Use of neural networks for sensor failure detection in a
control system," IEEE Control Systems Mag., vol. 10, pp. 49-55, Apr. 1990.
[103] K. Cohen, Y. Hu, W. Tompkins, and J.Webster, "Breath detection using a fuzzy neural network and
sensor fusion," in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, 7995, vol. 5, pp.
3491-3494, 1995.
[104] A. Chong, S. Wilcox, and J. Ward, "Prediction of gaseous emissions from a chain grate stoker boiler
using neural networks of ARX structure," IEE Proc. Science, Measurement and Technology, vol.
148, pp. 95-102, May 2001.
[105] Y. Ninomiya, "Quantitative estimation of SiO2 content in igneous rocks using thermal infrared spectra
with a neural network approach," IEEE Trans, on Geoscience and Remote Sensing, vol. 33, pp. 684691, May 1995.
[106] D. Tsintikidis, J. Haferman, E. Anagnostou, W. Krajewski, and T. Smith, "A neural network approach
to estimating rainfall from spaceborne microwave data," IEEE Trans. on Geoscience and Remote
Sensing, vol. 35, pp. 1079-1093, Sept. 1997.
[107] P. Chang and L. Li, "Ocean surface wind speed and direction retrievals from the SSM/I," IEEE Trans.
on Geoscience and Remote Sensing, vol. 36, pp. 1866-1871, Nov. 1998.
[108] Y.-A. Liou, Y. Tzeng, and K. Chen, "A neural-network approach to radiometric sensing of landsurface parameters," IEEE Trans, on Geoscience and Remote Sensing, vol. 37, pp. 2718-2724, Nov.
1999.
[109] C. Clerbaux, J. Hadji-Lazaro, S. Payan, C. Camy-Peyret, and G. Megie, "Retrieval of CO columns
from IMG/ADEOS spectra," IEEE Trans, on Geoscience and Remote Sensing, vol. 37, pp. 16571661, May 1999.
[110] B. Arrue, A. Ollero, and J. M. de Dios, "An intelligent system for false alarm reduction in infrared
forest-fire detection," IEEE Intelligent Systems, vol. 15, pp. 64-73, May 2000.
[111] T. Chady, M. Enokizono, R. Sikora, T. Todaka, and Y. Tsuchida, "Natural crack recognition using
inverse neural model and multi-frequency eddy current method," IEEE Trans, on Magnetics, vol. 37,
pp. 2797-2799,July 2001.
[112] T. Chady, M. Enokizono, and R. Sikora, "Signal restoration using dynamic neural network model for
eddy current nondestructive testing," IEEE Trans, on Magnetics, vol. 37, pp. 3737-3740, Sept. 2001.
[113] G. Pottie and W. Kaiser, "Wireless integrated network sensors," Communications of the ACM, vol. 43,
pp. 51-58, May 2000.
[114] R. Van Dyck and L. Miller, "Distributed sensor processing over an ad hoc wireless network:
simulation framework and performance criteria," in IEEE Military Communications Conf., 2001, vol.
2, pp. 894-898, 2001.
[115] L. Guibas, "Sensing, tracking and reasoning with relations," IEEE Signal Processing Mag., vol. 19,
pp. 73-85, Mar. 2002.
[116] F. Zhao, J. Shin, and J. Reich, "Information-driven dynamic sensor collaboration," IEEE Signal
Processing Mag., vol. 19, pp. 6172, Mar. 2002.
[117] F. Amigoni, A. Brandolini, G. DAntona, R. Ottoboni, and M. Somalvico, "Artificial intelligence in
science of measurements and the evolution of the measurements instruments: A perspective
conception," in Proc. 2002 IEEE Int. Symp. on Virtual and Intelligent Measurement Systems, pp. 2631, May 2002.
[118] J. Dias Pereira, P. Silva Girao, and O. Postolache, "Fitting transducer characteristics to measured
data," IEEE Instrumentation & Measurement Mag., vol. 4, pp. 26-39, Dec. 2001.
42
[119] W. Bock, E. Porada, M. Beaulieu, and T. Eftimov, "Automatic calibration of a fiber-optic strain sensor
using a self-learning system," IEEE Trans, on Instrumentation and Measurement, vol. 43, pp. 341346, Apr. 1994.
[120] P. Kluk and R. Morawski, "Static calibration of transducers using parametrization and neural-networkbased approximation," in Proc. IEEE Instrumentation and Measurement Technology Conf., 1996, vol.
1, pp. 581-585, June 1996.
[121] D. Massicotte, S. Legendre, and A. Barwicz, "Neural-network-based method of calibration and
measurand reconstruction for a high-pressure measuring system," IEEE Trans, on Instrumentation and
Measurement, vol. 47, pp. 362-370, Apr. 1998.
[122] R. Schultz, "Applications of neural networks for transducer calibration and signal processing of
transducer data containing periodic interference," in Proc. American Control Conf., 1999, vol. 3, pp.
1661-1662, June 1999.
[123] T.-F. Lu, G. C. Lin, and J. R. He, "Neural-network-based 3D force/torque sensor calibration for robot
applications," Engineering Applications of Artificial Intelligence, vol. 10, pp. 87-97, Feb. 1997.
Chapter 4
Neural Networks in System Identification
Gabor HORVATH
Department of Measurement and Information Systems
Budapest University of Technology and Economics
Magyar tudosok korutja 2, 1521 Budapest, Hungary
Abstract. System identification is an important way of investigating and
understanding the world around. Identification is a process of deriving a
mathematical model of a predefined part of the world, using observations. There are
several different approaches of system identification, and these approaches utilize
different forms of knowledge about the system. When only input-output
observations are used behavioral or black box model can be constructed. In black
box modeling neural networks play an important role. The purpose of this paper is to
give an overview of the application of neural networks in system identification. It
defines the task of system identification, shows the basic questions and introduces
the different approaches can be applied. It deals with the basic neural network
architectures, the capability of neural networks and shows the motivations why
neural networks are applied in system identification. The paper presents the main
steps of neural identification and details the most important special problems, which
must be solved when neural networks are used in system modeling. The general
statements are illustrated by a real world complex industrial application example,
where important practical questions and the strength and weakness of neural
identification are also discussed.
4.1. Introduction
System identification is the process of deriving a mathematical model of a system using
observed data. Modeling is an essentially important way of exploring, studying and
understanding the world around. A model is a formal description of a system, which is a
separated part of the world. A model describes certain essential aspects of a system.
In system modeling three main principles have to be considered. These are separation,
selection and parsimony.
The world around is a collection of objects, which are in interactions with each other:
the operation of one object may have influence on the behavior of others. In modeling we
have to separate one part of the world from all the rest. This part is called the system to be
modeled. Separation means that the boundaries which separate the system from its
environment have to be defined
The second key principle is selection. Selection means that in modeling only some
essential aspects of a system are considered. There are many different interactions between
the parts of a system and between the system and its environment. However, in a modeling
task all interactions cannot be considered. Some types of interactions have to be taken into
account while others must be neglected. The selection of the aspects to be considered
depends on the final goal of modeling. Some aspects are important and must be represented
in one case, while entirely different aspects are to be represented in another case, even if the
system is the same. This means that a model is always imperfect, it is a simplified
44
45
Model classes can be categorized in different ways depending on the aspects taken into
consideration.
Based on the system characteristics we can distinguish between
- static or dynamic,
- deterministic or stochastic,
- continuous-time or discrete-time,
lumped parameter or distributed parameter,
- linear or non-linear,
time invariant or time variant, etc.
models.
All these differentiations are important for the further steps of the whole identification
process.
Independently from the previous aspect we can build parametric or nonparametric
models.
In parametric models a definite model structure is selected and only a limited number of
parameters must be estimated using observations. In many cases there are some physical
insight about the system, we know what important parts of the system can be distinguished,
how these parts are connected, etc, so we know the structure of the model. In these cases
physical models can be built. Physical models are typical parametric models, where the
structure of the model is determined using physical insight.
In nonparametric models there is no definite model structure and the system's behavior
is described by the response of the system for special excitation signal. Nonparametric
models can be built if we have less knowledge about the system. Typical nonparametric
description of a system is the impulse response or the frequency characteristics.
4.2.1 Model set selection
Model set selection is basically determined by the available information. The more
information is available the better model can be constructed and the more similarity will be
between the system and its model. Based on prior information we can speak about white
box, grey box or black box models.
When both the structure and the parameters of the model are completely known complete physical knowledge is available - we have a white box model. White box models
can be constructed from the prior information without the need of any observations.
When the model construction is based only on observed data, we speak about inputoutput or behavioral model. An input-output model is often called empirical or black box
model as the system to be modeled is considered as a black box, which is characterized
with its input-output behavior without any detailed information about its structure. In black
box modeling the model structure does not reflect the structure of the physical system, thus
the elements of the model structure have no physical meaning. Instead, such model
structure has to be chosen that is flexible enough to represent a large class of systems.
Of course the white box and the black box models represent extremes. Models actually
employed usually lie somewhere in between. In most of the identification tasks we have
certain physical information, however this is not complete (incomplete theoretical
knowledge). We can construct a model, the structure of which is selected using available
physical insight, so the structure of the model will correspond to that of the physical
system. The parameters of the model, however, are not known or only partly known, and
they must be estimated from observed data. The model will be fitted empirically using
observations. Physical modeling is a typical example of grey-box modeling. The more
complete the physical insight the "lighter" grey box model can be obtained and vice verse.
The "darkness" of model depends on the missing and known information as shown in
Figure 2.
46
The approach used in modeling depends not only on prior information, but the
complexity of the modeling procedure and the goal of modeling as well. As building black
box models may be much simpler than physical modeling, it is used not only when the lack
of physical insight does not let us build physical models, but also in such cases when we
have enough physical knowledge, but it is too complex, there are mathematical difficulties,
the cost of building physical models is too high, etc.
In black box modeling - contrary to physical ones - the model structure is not
determined entirely by selecting the model class. We have to determine the size of the
structure, the number of model parameters (e.g., in a polynomial model class the maximum
order of the polynomial, etc.). To determine the proper size of the model and the numerical
values of the parameters additional information about the system have to be used. This
additional information can be obtained from observations. For collecting observations we
have to design experiments, to design input signals, and measure the output signals as
responses for these input ones.
4.2.2 Experiment design
Experiment design has an important role of getting relevant observations. In the step of
experiment design the circumstances of input-output data collection is determined, the
excitation signals are designed. The construction of excitation signal depends on the prior
knowledge about the system. For example, different excitation signals have to be used to
identify a linear and a non-linear system; the excitation depends on whether the system is
static or dynamic, deterministic or stochastic, etc. In non-linear system identification the
selection of excitation signal depends on the required validity range of the model. Different
excitations can be used if model validity is required only for the neighborhood of an
operating point or if such model is needed that reflects some important aspects of the
system in many different operating points, etc.
In general we have to select input signals that will excite the system in such a way that
the input-output data can be observed during the experiment carry enough knowledge
about the system. In system identification it is often required to design new and
significantly modified experiments during the identification process, where the knowledge
collected from the previous experiments are utilized.
In many cases experiment design means to determine what signals can be measured at
all, so this step depends largely on the practical identification task. In some identification
problems there is no possibility to design excitation, we can only measure the input and
output data available in normal operating conditions. This situation may happen when
experiment design would be too expensive or when the system to be modeled is an
autonomous one, which operates without explicit input signals, etc.
The general and special questions of experiment design are beyond the scope of this
paper, interested readers can consult relevant books, e.g. [1,2].
4.2.3 Model parameter estimation
Model set selection means that the relation between inputs and outputs of a system is
formulated in a general mathematical form. This mathematical form defines the structure of
the model and defines a set of parameters, the values of which have to be determined
during the further steps of the identification process. In the sequel we assume that the
system implements an / : Rn > R mapping, however the scalar output is used only for
simplicity. This mapping is represented by a set of input-output measurement data
The relation between the input and output measurement data can be described as
where n(i) is the observation noise.
This system will be modeled by a general model structure. The mapping of the model
f will approximate in some sense the mapping of the system.
The model also implements an RN R mapping; yM is the output of the model and 0 is
the parameter vector of the model structure.
Having selected a parametrized model class, the parameters of the model have to be
determined. There are well-developed methods, which give estimates for the numerical
values of the parameters. These parameter estimation methods utilize different types of
knowledge available about the system to be modeled. We may have prior information about
the nature of the parameters to be determined (e.g., we may have physical knowledge about
the possible range of certain parameters, we may have information if some parameters are
deterministic ones or can be considered as random variables with known probability
distribution, etc.), but the essential part of the knowledge used for parameter estimation is a
set of measurement data, a set of observations
about the system.
Parameter estimation is a way of adjusting the model parameters for fitting the
observations according to some criterion function. The parameter estimation process is
shown in Figure 3.
Depending on the criterion function (which also may depend on the prior information
about our system) we can speak about least square (LS) estimation, weighted least square
(WLS) estimation, maximum likelihood (ML) estimation or Bayes estimation.
A criterion function is a measure of the quality of the model, it is a function of the error
between the model output yM and the system output y:
where Zp denotes the set of measured data pairs
48
If both model structure and model size are fixed, model parameters have to be
estimated. In parameter estimation the selection of criterion function mainly depends on the
prior information. The most common measure of discrepancy is the sum of squared error,
(5)
or the average of the squared error between the model outputs and the observations, which
is often called empirical risk:
(6)
i.e., usually quadratic criterion functions are used.
Quadratic criterion function can always be applied, because it requires only the
observed input - output data of the system and the output data of the model for the known
input data. The parameter estimate based on this quadratic criterion function is the least
square estimate:
(7)
The observations are noisy measurements, so if something is known about the statistical
properties of the measurement noise some statistical estimation can be applied. One of the
most common statistical estimations is maximum likelihood (ML) estimation, when we
select the estimate, which makes the given observations most probable.
= argmax
(8)
where
denotes the conditional probability density function of the observations.
The maximum likelihood estimate is illustrated in Figure 4.
If the parameter to be estimated is a random variable and if its probability density
function is known, we can apply Bayes estimation. Although Bayes estimation has certain
optimality property, it is rarely applied because it requires more prior information than ML
or LS estimations.
There is no place to discuss the classical estimation methods in detail. There are many
excellent books and papers dealing with the classical system identification methods; they
49
give detailed discussion of parameter estimation methods as well, especially for linear
dynamic systems, see e.g. [17].
Measurements
(9)
From this point of view black box identification is similar to the general identification case,
except that there is no other knowledge about the system than the observations:
A black box model will give a relationship between the observed inputs and outputs. The
mapping of the model can be described as
where 0 is the
parameter vector of the model.
There are several different forms of this relationship, however, a general form can be
described as a weighted sum of given basis functions
(11)
50
There are many possible basis function sets, which can be applied successfully in
system identification (nonlinear function approximation). For example, we can form
polynomial functions, when the mapping of the system is approximated by a polynomial, or
we can use complex exponentials, which means, that the mapping of the system is
approximated by a Fourier series. But Taylor expansion, wavelets or Volterra series can
also be applied. Among the black box structures neural networks play an important role.
The selection between the possibilities usually based on prior information about the
system, or on some general (theoretical or practical) advantages or drawbacks of the
different black box architectures.
Having selected a basis function set two problems must be solved: (i) how many basis
functions are required in this representation, and (ii) how the parameters of the model can
be estimated. The first question belongs to the model selection problem, the selection of the
size of the model, while the second question is a parameter estimation problem.
The answers to these questions can be divided into two groups. There are general
solutions, which are valid for all black box modeling approaches, and there are special
results which apply only for a given black box architecture. The general answers are related
mainly to the model size problem, while for the parameter estimation task different
methods have been developed for the different black box architectures. Most of these
methods are discussed in detail in the basic literature of system identification, here only
such methods will be presented that are directly related to neural modeling.
The next sections give an overview of neural networks, presents the most important
neural architectures and the most important features of the neural paradigm, it shows why
neural networks are important in system modeling. The special problems, difficulties in
neural modeling and possible solutions to avoid these difficulties will also be discussed.
4.4. Neural networks
Neural networks are distributed information processing systems made up of a great number
of highly interconnected identical or similar simple processing units, which are doing local
processing, and are arranged in ordered topology. An important feature of these networks is
their adaptive nature, which means that its knowledge is acquired from its environment
through an adaptive process called learning. The construction of neural networks uses this
iterative process instead of applying the conventional construction steps (e.g.,
programming) of a computing device. The roots of neural networks are in neurobiology;
most of the neural network architectures mimic biological neural networks, however in
engineering applications this neurobiological origin has only a limited importance and
limited effects.
In neural networks several slightly different elementary neurons are used, however, the
neural networks used for system modeling usually apply two basic processing elements.
The first one is the perceptron and the second is the basis function neuron.
The perceptron is a nonlinear model of a neuron. This simple neural model consists of
two basic parts: a linear combiner and a nonlinear activation function. The linear combiner
computes the scalar product of the input vector x of the neuron and a parameter vector
(weight vector) w:
Every element of the weight vector determines the strength of the connection from the
corresponding input. As
serves as a bias value. The bias has the effect of increasing
or decreasing the input signal level of the activation function depending on its sign. The
nonlinear activation function is applied for the output of the linear combiner. It is
in System Identification
responsible for the nonlinear behavior of the neuron model. The mapping of the elementary
neuron is:
y=g( S )=g(W T X)
(13)
where g(.) denotes a the nonlinear activation function. In most cases the activation function
is a monotonously increasing smooth squashing function, as it limits the permissible
amplitude range of the output to some final value. The typical activation functions belong
to the family of the sigmoidal functions. The most common elements of this family are the
logistic
function,
y - tanh(s)
y = sgm(s) = l +e
tangent
function,
l-e~2s
.
l + e 2*
(14)
where g(.) is a nonlinear basis function and c is a parameter of the basis function. Typical
basis functions are the radially symmetric functions, like a Gaussian function, where c is a
centre parameter. In Gaussian basis functions there is another parameter, the width , as a
Gaussian function is given by:
c,.l2/2C7,2j
(15)
Both neuron types can be used in many, different neural architectures. Here only such
architectures will be discussed which can be used for system modeling.
For constructing a neural network first its architecture must be selected, than the free
parameters of the architecture must be determined. To select the architecture we must
determine what type and how many elementary neurons are to be used and how they should
be organized into a certain - usually regular - structure. The values of the free parameters
can be determined using the networks' adaptive nature, their learning capability.
System identification usually means identification of dynamic systems, so when dealing
with neural architectures the emphasis will be on dynamic neural networks. However, as
dynamic networks are based on static ones, first a short overview of the basic static neural
architectures will be given.
For presenting the most important dynamic neural structures two different approaches
will be followed. We will begin with the classical dynamic neural architectures, then a
general approach will be shown, where the nonlinear dynamic mapping is represented as a
nonlinear function of a regressor vector. Using this approach, which has been introduced in
linear dynamic system identification, we can define important basic nonlinear dynamic
model classes.
4.5. Static neural network architectures
The most common neural architecture is the multi-layer perceptron (MLP). An MLP is a
feed-forward network built up of perceptron-type neurons, arranged in layers. An MLP has
an input layer, one or more hidden layers and an output layer. In Figure 5 a single hidden
layer multi-input - multi-output MLP is shown. An MLP is a fully connected network,
52
which means that every node (neuron) in each layer of the network is connected to every
other neuron in the adjacent forward layer. The Jt-th output of a single hidden layer MLP
can be written as:
(16)
Here w(I)kj denotes a weight of the MLP, which belongs to the fc-th neuron in layer / and
which is connected to they-th neuron's output of the previous layer. The g(.)-s in Eq. (16)
stand for the activation functions. In the figure w(I) contains all weights of layer /.
input layer
/
<> /
*6=1
\fv
output layer
hidden layer
-\
1
y\
Perhaps the most important question arising about MLPs is its computational or
modeling capability. Concerning this question the main result is that a one hidden-layer
feed-forward MLP with sufficient number of hidden processing elements of sigmoidal type,
and a single linear output neuron is capable of approximating any continuous function
f.P?>R\.Q any desired accuracy.
There are several slightly different mathematical results formulating the universal
approximation capability, the most important of which were developed by Hornik [8],
Cybenko [9], Funahashi [10], Leshno et al. [11], etc. Here only the result of Cybenko will
be cited:
Let g be any continuous sigmoid-type function, then give any continuous real-valued
function/on [0,l]"or any other compact subset of RN and e > 0, there exist vectors w(I) and
w (2) , and a parametrized function f(x,w (l) ,w (2) ): [0,1]N .R such that
forallxG[0,l] N
(17)
where
(18)
'=
In Eq. (18) w(1) =lw{ l) ,w ( 2 ) ,...,wji ) j is the weight vector of the first computing layer
(what is usually called hidden layer), wherew'/'e R"*1 j=l,2,...M, is the weight vector of
is the weight
x(k)
First layer,
Implements
nonlinear
mapping
Figure 6: General network with nonlinear hidden layer and linear output layer.
The network has two computing layers: the first one is responsible for an RN > R*1 nonlinear mapping, which results in an intermediate vector g(k) = [g,(k),g2(k\...,gM(k)]T.
The elements of this intermediate vector are the responses of the basis functions. The
output of the mapping is then taken to be a linear combination of the basis functions.
In an MLP the basis functions are parametrized sigmoidal functions where the
parameters are the weight values of the hidden layer. So a single hidden layer MLP has two
parameter sets: w(1) consists of all weights of the hidden layer and w (2) is formed from the
weights of the output linear combiner.
There are several further neural network architectures, which also implement weighted
sum of basis functions, but where these basis functions are not sigmoidal ones.
When radial basis functions are used the Radial Basis Function (RBF) neural network is
obtained, but the Cerebellar Model Articulation Controller (CMAC) [12] and the
Functional Link Network (FLN) [13] or the Polynomial Neural Network (PNN) [14], etc.
are also elements of the two-computing-layer networks, where nonlinear mapping is
implemented only in the first (hidden) layer.
Perhaps the most important member of this family and the second most popular network
architecture behind MLP is RBF. In an RBF network all neurons of the first computing
layer simultaneously receive the JV-dimensional real-valued input vector x, so this layer
consists of basis function neurons. The outputs of these neurons are not calculated using the
weighted-sum/sigmoidal activation mechanism as in an MLP. The output of each hidden
basis function neuron is obtained by calculating the "closeness" of the input x to an Ndimensional parameter vector c associated to the j-th hidden unit. The response of the j-th
hidden element is given by:
(19)
54
Typical radial basis functions are the Gaussian functions of Eq. (15) where the c(
vectors are properly selected centres and the cr values are the width parameters of the basis
functions. The centres are all different for the different hidden neurons, the width
parameters may be different, but often a common width parameter a is used for all basis
functions. A Gaussian function is a local basis function where its locality is determined by
the width parameter.
The RBF networks - similarly to the MLPs - are also universal approximates [15],
where the degree of accuracy can be controlled by three parameters: the number of basis
functions used, their location (the centre parameters) and their width. Because of the similar
modeling capabilities of MLPs and RBFs, they are alternative neural architectures in black
box system identification.
Besides their similarities these two architectures differ from each other in several
aspects. These differences - although do not influence their essential modeling capability may be important from practical point of view. One architecture may require smaller
number of nodes and parameters than the other; there may be significant differences
between the learning speed of the two architectures, etc. However, all these differences can
be considered as technical ones; their detailed discussion is beyond the scope of this paper.
Interested readers can consult some excellent books, e.g. [16,17].
CMAC is also a feed-forward network with similar capability. It uses hidden units with
local basis functions of predefined-positions. In the simplest case, in binary CMAC [12] finite support rectangular basis functions are used, but higher-order CMACs can also be
defined, when higher order basic splines are applied as local basis functions [17]. The
modeling capability of a CMAC is slightly inferior to that of an MLP [18,19] (a binary
CMAC implements a piecewise linear mapping, and only higher order CMACs can
implement continuous input-output mapping), but it has significant implementation
advantages especially when embedded hardware solutions are required [20].
4.6. Dynamic neural architectures
The basic neural network architectures presented in the previous section all implement
static nonlinear mapping between their inputs and output,
that is the output at a discrete time step k depends only on the input at the same time instant.
Static networks can be applied for static nonlinear system modeling.
In black box system identification, however, the really important task is to build models
for dynamic systems. In dynamic systems the output at a given time instant depends not
only on its current inputs, but on the previous behavior of the system. Dynamic systems are
systems with memory.
4.6.1 Extensions to dynamic neural architectures
There are several ways to form dynamic neural networks using static neurons, however in
all ways we use storage elements and/or apply feedback. Both approaches can result in
several different dynamic neural network architectures.
Storage elements can be used in different parts of a static network. For example, some
storage modules can be associated with each neuron, with the inputs or with any
intermediate nodes of a static network. As an example a feed-forward dynamic network can
be constructed from a static multi-input - single-output network (e.g., from an MLP or
RBF) if a tapped delay line is added as shown in Figure 7. This means that the static
network is extended by an embedded memory, which stores the past values of the inputs.
Input
55
x(k)
T
x(k-l)
D
L ' x(k-N)
Multi-input
single-output
static network
X*)
Output
Tapped delay lines can be used not only in the input signal path, but at the intermediate
nodes of the network or in the output signal path.
A feed-forward dynamic neural architecture can also be obtained if tapped delay lines
are applied for the inputs of all neurons, that is all weights of a static network are replaced
by linear filters. If finite impulse response (FIR) filters are used, the resulted dynamic
architecture is the FIR-MLP, which is shown in Figure 8.
The output of the i'-th neuron in layer / is given as:
(21)
where
0J
elements of which are associated with the corresponding taps of the FIR filter. The input
vector of this filter is formed from the delayed outputs of the /-th neuron of the previous
layer:
/ -th layer
Figure 8. FIR-MLP feed-forward neural network architecture.
If the tapped delay line is used in the output signal path, a feedback architecture can be
constructed, where the inputs or some of the inputs of a feed-forward network consist of
delayed outputs of the network. The resulted network is a recurrent one. A possible
architecture where tapped delay lines are used both in the input and in the output signal
paths is shown in Figure. 9.
56
Input
t
x(k)
T X(k-iy
D :
L ' x(k-N)
y(k-M)
D
T
\ X*-2)
i
Multi-input
single output
static network
X*)
Output
->(*-') p
These dynamic neural networks are general dynamic nonlinear modeling architectures
as they are based on static networks with universal approximation property. In these
architectures dynamics is introduced into the network using past values of the system
inputs, of the intermediate signals and/or of the outputs.
The structure in Figure 9 applies global feedback from the output to the input. However,
dynamic behavior can also be obtained if local feedback is used. In this case not the
network's output but the output of one or more neuron(s) are applied as inputs of either the
same or different neurons. Some possibilities are shown in Figure 10. Such typical dynamic
neural architectures are the Jordan and the Elman network [21].
A further possibility to construct dynamic neural network is to combine static neural
networks and dynamic linear networks. Within this approach both feed-forward and
feedback architectures can be defined as proposed by Narendra [22]. In Figure 11 some
combined architectures are shown. In the figure N stands for static neural networks, while
H(z) denotes linear dynamic systems.
2. hidden layer
Output layer
The model of Figure 11 a.) is also known as Hammerstein model, while the model of b.)
as Hammerstein-Wiener model [2]. Similarly to the Hammerstein model a Wiener model
can be constructed where the order of the static nonlinear part and the dynamic linear part is
changed over. Also there is a model structure called Wiener - Hammerstein model, which
is similar to model b.) except that a static nonlinear system is placed between two linear
dynamic ones.
57
N2
&)
(a) model
H(z) % TV
<*)
N
If
&) %^
(*')
'+
1
(c) model
H(z)
(b) model
y.
(d) model
H(z)
where 0 is the parameter vector and (k) denotes the regressor vector.
The regressor can be formed from past inputs, past system outputs, past model outputs
etc. according to the model structure selected. The following regressors can be defined:
When only the past inputs are used the regressor is formed as:
(25)
the NARX model can be constructed. This model is often called series-parallel model [22],
as it uses a feedback, however this feedback comes from the system's - and not the from the
model's output, let us avoid forming a really recurrent model architecture.
The regressor can be formed from the past inputs and past model outputs
The corresponding structure is the NOE model. In a NOE model there is a feedback
from model output to its input, so this is a recurrent network. Sometimes NOE model is
called as parallel model [22]. Because of its recurrent architecture serious instability
problem may arise, which cannot be easily handled.
In the NARMAX model the past inputs, the past system outputs and the past model
outputs are all used. Usually the past model outputs are used to compute the past values of
the difference between the outputs of the system and the model,
(27)
58
In this equation yMx(k - i) is the model output when only the past inputs are used. The
corresponding regressor is
(30)
Although the definitions of these general model classes are different from the definition
of the classical dynamic neural architectures, those structures can be classified according to
these general classes. For example, an FIR-MLP is an NFIR network, but the combined
models a) and b) in Figure 11 also belong to the NFIR model class, while the neural
structure of Figure 9 is a typical NOE model.
The selection of the proper model class for a given identification problem is not an easy
task. Prior information about the problem may help in the selection, although these model
classes are considered as general black box architectures and black box approach is usually
used if no prior information is available.
The general principle of parsimony can also help to select among the several possible
model classes. As formulated by the Occam's razor we always have to select the simplest
model, which is consistent with the observations. This means that we should start with
linear models and only if the modeling accuracy is not good enough we can go further to
the more complex NFIR, NARX, NOE, etc., model structures.
The selection of model structure is only the first step of the neural model construction,
further important steps are required: to determine the model size and the model parameters.
All these steps need the validation of the model, so model class and model size selection as
well as the model parameter estimation cannot be done independently from model
validation. The question of model size selection will be discussed in the section of model
validation, some basic question of parameter estimation - the learning - are the subject of
the next section.
4.7. Model parameter estimation, neural network training
In neural networks the estimation of parameters, the determination of the numerical values
of the weights is called learning. As it was mentioned, learning is an iterative process, when
the weight values of the network are adjusted step by step until we can achieve the best fit
between observed data and the model. The learning rules of neural networks can be
categorized as supervised learning, which is also referred to as learning with a teacher and
unsupervised learning. In both cases the learning process utilizes the knowledge available
in observation data, what is called training data.
4.7.1 Training of static networks
Neural networks used for system modeling are trained with supervised training. In this case
the weights of a neural network are modified by applying a set of labeled training samples
Zp = {%', y1 }1,. Each training sample consists of a unique input x(/) and a corresponding
59
desired output y(i). During training every samples are applied to the network: a training
sample is selected usually at random from the training set, the input is given to the network
and the corresponding response of the network is calculated, then this response, the output
of a network is compared to the corresponding desired output. For evaluating the network
response, a criterion function is defined which is a function of the difference between the
network's output and the desired output.
The network output (and the modeling error too) depends on the network parameters, .
Here 0 consists of all weights of a neural network. Usually a quadratic criterion function is
used: the most common measure of discrepancy for neural networks is the squared error
(32)
2 ,=1
(33)
where C(E) is the standard criterion function, Cr is a so called regularization term and A is
the regularization parameter, which represents the relative importance of the second term.
This approach is based on the regularization theory developed by Tikhonov [24]. The
regularization term usually adds some constraint to the optimization process. The constraint
may reflect some prior knowledge (e.g., smoothness) about the function approximated by
the network, can represent a complexity penalty term, or in some cases it is used to improve
the statistical stability of the learning process.
When regularization is used for complexity reduction the regularization term can be
defined as the sum of all weights of the network:
Using this term in the criterion function the minimization procedure will force some of
the weights of the network to take values close to zero, while permitting other weights to
retain their relatively large values. The learning procedure using this penalty term is called
weight-decay procedure. This is a parametric form of regularization as the regularization
term depends on the parameters of the network.
There are other forms of regularization, like
(35)
where <b(f(x)) is some measure of smoothness. This latter is a typical form of nonparametric reularization. Regularization can often lead to significantly improved network
performance.
The performance measure is a function of the network parameters; the optimal weights
values of the network are reached when the criterion function has a minimum value. For
neural networks used for function approximation the criterion function is a continuous
function of the parameter vector, thus it can be interpreted as a continuous error surface in
the weight space. From this point of view network training is nothing else than a minimum
seeking process, where we are looking for a minimum point of the error surface in the
weight space.
The error surface depends on the definition of the criterion function and the neural
network architecture. For networks having trainable weights only in the linear output layer
60
(e.g., networks with architecture shown in Figure 6) and if the sum of squares error is used
as criterion, the error surface will be a quadratic function of the weight vector; the error
surface will have a general multidimensional parabolic form. In these networks the first
layer is responsible for the nonlinear mapping, but this nonlinear mapping has no adjustable
parameters. These networks implement nonlinear but linear-in-the-parameter mappings.
Typical networks with quadratic error surface is an RBF network if the centre and width
parameters are fixed, and a CMAC network where there is no trainable parameter in the
first nonlinear layer.
The consequence of the parabolic error surface is that there will be a single minimum,
which can be located using rather simple ways. For a quadratic error surface analytic
solution can be obtained, however even for such cases usually iterative algorithms, e.g.,
gradient search methods are used. In gradient-based learning algorithms first the gradient of
the error surface at a given weight vector should be determined, then the weight vector is
modified in the direction of the negative gradient:
Here V(&) is the gradient of the error surface at the fc-th iteration, ju is a parameter
called learning rate, which determines the size of the step done in the direction of the
negative gradient.
Eq. (36) is a general form of the gradient algorithm. For networks with one trainable
layer the gradient can be computed directly, however for networks with more than one
trainable layer the gradient calculation needs to propagate the error back, as the criterion
function gives errors only at the outputs. Such networks, like MLPs require this error back
propagation process. The result is the error backpropagation learning algorithm, which
calculates the gradients using the chain rule of derivative calculus. Because of the need of
propagating the error back to the hidden layers, the training of a multi-layer network may
be rather computation intensive.
Moreover, the error function for networks with more than one trainable layer may be
highly nonlinear and there may exist many minima in the error surface. These networks like MLPs - implement nonlinear mappings, which are at least partly nonlinear-in-theparameter mappings. Among minima there may be one or more for which the value of the
error is the smallest, this is (these are) the global minimum (minima); all the other
minimum points are called local minima. For nonlinear-in-the-parameter error surfaces we
cannot find general closed form solutions. Instead iterative - usually gradient based methods are used. Although an iterative, gradient-based algorithm does not guarantee that
the global minimum will be reached, the learning rules applied for nonlinear-in-theparameter neural networks usually are also gradient based algorithms.
A more general gradient-based learning rule can be written as:
w(* + l) = w(fc) + /iQ(-V(A:))
(37)
where Q is a matrix, which modifies the search direction and which usually reflects some
knowledge about the error surface.
Several different gradient rules can be derived from this general one if we specify Q. If
Q = I the identity matrix, we can get the steepest descent algorithm (Eq. 36). With Q = H '
and \L =1/2 the Newton algorithm is obtained, where H ' is the inverse of the Hessian of the
criterion function. The Hessian matrix is defined by
f tfc 1
(38
H = VVC( ) =
>
From the general form of the gradient learning rule the Levenberg-Marquardt rule [16]
can also be obtained. In this case an approximation of the Hessian is applied to reduce the
61
computational complexity. The different gradient-based algorithms can reach the minimum
using less learning iterations, however, one iteration requires more complex computations
than the simple steepest descent method.
4.7.2 Training of dynamic networks
The learning rules discussed so far can be applied for static neural networks. For training
dynamic networks some additional problems must be solved. Dynamic networks are
sequential networks, which means that they implement nonlinear mapping between inputand output data sequences. So the training samples of input-output data pairs of static
networks are replaced by input-output data sequences and the goal of the training is to
reduce a squared error derived from the elements of the corresponding error sequences. If
e(k) is the output error of a dynamic network at discrete time step k, the squared total error
can be defined as:
62
x(\)
a)
Jc(2)
b)
Another method to train a recurrent network is the real-time recurrent learning (RTRL),
where the evolution of the gradient over time steps can be written in recursive form [27]. In
RTRL the weights are modified in every time steps. This violates the requirement of
updating the weights only after a whole training sequence was applied, however, it was
found that updating the weights after each time step works well as long as the learning rate
H is kept sufficiently small. Sufficiently small learning rate means that the time scale of the
weight changes is much smaller then the time scale of the network operation. Real time
recurrent learning avoids the need for allocating memory proportional to the maximum
sequence length and leads to rather simple implementations.
During training all training data are usually used many times. The number of training
cycles may be quite large, and it is important to find when to stop training. To determine
the optimal stopping time the performance of the network must be checked, the network
must be validated. So validation helps not only to determine the proper complexity of the
network as was indicated before, it is also used to decide whether we have to stop training
at a given training cycle.
63
The question of optimal model complexity can be discussed from another point of view.
This is the bias-variance trade-off. The significance of bias-variance trade-off can be
shown if the modeling error is decomposed into a bias and a variance term. As it was
defined by Eq. (5), the modeling error is the sum of the squared errors or the average of the
squared error
(40)
This error definition is valid for all model structures: if (k) = x(k) we will have a
static model and if (k) is one of the regressors defined in section 6, it refers to the error of
a dynamic network.
64
Now, consider the limit in which the number of training data samples goes to infinity,
the average of the squared error approximates the mean square error, the expected value of
the squared error, where expectation is taken over the whole data set.
(42)
This expression can be decomposed as:
(43)
Here the first term is the variance and the second one is the squared bias.
(44)
The .size of the model, the model order will have an effect on the bias-variance tradeoff. A small model with fewer than enough free parameters will not have enough
complexity to represent the variability of the system's mapping, the bias will generally be
high, while the variance is small. A model with too many parameters can fit all training
data perfectly, even if they are noisy. In this case the bias term vanishes or at least
decreases, but the variance will be significant. (Figure 14.)
In static neural models the model complexity can be adjusted by the number of the
hidden neurons. In dynamic models, however this question is more complex. First a proper
size of the selected model class must be determined, e.g., for an NFIR architecture we have
to select the length of the tapped delay line, or for a NARX or a NARMAX model the
lengths of the corresponding tapped delay lines, etc., then the number of hidden neurons
which implement the nonlinear mapping have to be determined. Moreover, it can be shown
that the selection of the proper model complexity cannot be done independently from the
number of available training data samples. There must be some balance between model
complexity and the number of training data. The less training points are used, the less
knowledge is available from the system and the less free parameters can be used to get a
model of good generalization. Of course model complexity must reflect the complexity of
the system, more complex systems need more data, which allows building more complex
models: models with more parameters.
The question of model complexity versus number of training points and model
performance (generalization capability) has been studied from different points of view. One
early result for static neural networks gives an upper bound of MSE as a function of the
smoothness of the mapping to be approximated, the complexity of the network and the
number of training points [28].
(M )
(45)
65
(46)
Here
(47,
where h is the VC-dimension. VC-dimension is a characteristic parameter of the function
set used in the approximation. For the validity of Eq. (46) we need that the probability of
observing large values of the error is small [30]. It can be proved that models with good
generalization property can be obtained only if h is finite [29]. The generalization bound of
Eq. (46) is particularly important for model selection, since it provides an upper limit for
complexity for a given sample size P and confidence level rj.
4. 8. 1 Model order selection for dynamic networks
For dynamic systems modeling proper model order selection is especially important. As the
correct model order is often not known a priori it makes sense to postulate several different
model orders. Based on these, some criterion can be computed that indicated which model
order to choose. One intuitive approach would be to construct models of increasing order
until the computed squared error reaches a minimum. However as it was shown previously
the training error decreases monotonically with increasing model order. Thus, the training
error alone might not be sufficient to indicate when to terminate the search for the proper
model complexity; model complexity must be penalized to avoid using too complex model
structures.
Based on this approach several general criteria were proposed. The most important ones
are the Akaike Information Criteria (AIC) [31] and the Minimum Description Length
(MDL) [32], which were developed for linear system modeling. Recently for MLPs a
network information criterion (NIC) was proposed by Amari [33], which was derived from
AIC. The common feature of these criteria is that they have two terms: the first one
depends on the approximation error for the training data (i.e. the empirical error), while the
second is a penalty term. This penalty grows with the number of free parameters. Thus, if
the model is too simple it will give a large value for the criterion because the residual
training error is large, while a too complex model will have a large value for the criterion
because the complexity term is large.
The methods based on the different criteria need to build and analyze different models,
so these methods are rather computation intensive ones and their applicability is
questionable in practical cases. Recently a new heuristic method was proposed for
identifying the orders of input-output models for unknown nonlinear dynamic systems [34].
This approach is based on the continuity property of the nonlinear functions, which
represent input-output mappings of continuous dynamic systems. The interesting and
attractive feature of this approach is that it solely depends on the training data. The model
orders can be determined using the following index:
(48)
*"1
66
where q^N'(k)is the Jt-th largest Lipschitz quotient among all q}"} (i *j\ ij = 1, 2, ... , P) N
is the number of input variables and p is a positive number, usually 0.0 IP -0.02P. Here the
q(j Lipschitz quotient is defined as:
(49)
where the {x(i), y(i)} i=l, 2, ... , P pairs are the measured input-output data samples from
which the nonlinear function /(.) have to be reconstructed. This index has the property that
q(N+l) is very close to q(N) while q (N-1) is much larger than q(N) if N is the optimal number
of the input variables, so a typical curve of q(N) versus N has a definite point (N0) where the
decreasing tendency stops and q(N) enters a saturated range. For an NFIR model N0 is the
optimal number of input order. Figure 15 (a) shows a typical curve for q*N.
b)
a)
The Lipschitz index can be applied not only for NFIR structures but also for NARX
model classes, where two, the order of the feed-forward and the feedback paths must be
determined. For NARX model class
(50)
(N)
L M
the following strategy can be used. The Lipschitz index q =q( + ) should be computed
for different model orders, where L denotes the feedback and M the feed-forward order
values. Starting with N=l, where only y(k-l) is used as input q(1+0) can be computed. Then
let N = 2, where the both x(k-l) and y(fc-l) are used as inputs and q*M' can be computed.
For N=3 the third input of the dynamic networks will be y(k-2) and </2+l' will be computed.
This strategy can be followed increasing step by step the feedback and the feed-forward
orders. If at a given L and M one can observe that q^L+M' is much smaller than q\L~l+M'or
^(L+Af-i)but is very close to q(L+I+M) Qr ^(t+M+i) we reached the appropriate order values.
The most important advantage of this method is that it can give an estimate of the model
order without building and validating different complexity models, so it is a much more
efficient way of order estimation then the criteria based approaches. However, there is a
significant weakness of the Lipschitz method: it is highly sensitive to observation noise.
Using noisy data for model construction - depending on the noise level - we can often get a
67
typical curve for the Lipschitz index as shown in Figure 15 (b). The most important feature
of this figure is that there is no definite break point.
4.8.2 Cross-validation
Modeling error can be used in another way for model validation. This technique is called
cross-validation. In cross-validation - as it was mentioned before - the available data set is
separated into two parts, a training set and a test set. The basic idea of cross-validation is
that one part of the available data set is used for model construction and another part for
validation. Cross-validation is a standard tool in statistics [35] and can be used both at the
model structure selection and at parameter estimation. Here its role in the training process
will be presented.
The previous validation techniques for selecting the proper model structure and size are
rather complex, computation intensive methods. This is the most important reason why they
are applied only rarely in practical neural model construction. The most common practical
way of selecting the size of a neural network is the trial and error approach. First a network
structure is selected, then the parameters are trained. Cross-validation is used to decide
whether or not the performance of the trained network is good enough. Cross-validation,
however, is used for another purpose too.
As it was mentioned in the previous section to determine the stopping time of training is
rather difficult as a network with quite large number of free parameters can leam the
training data almost perfectly. The more training cycles are applied the smaller error can be
achieved on the training set. However, small training error does not guarantee good
generalization. Generalization capability can be measured using a set of test data consists of
samples never seen during training.
Figure 16 shows two learning curves, the learning curves of the training and the test
data. It shows, that usually the training error is smaller than the test error, and both curves
decrease monotonically with the number of training iterations till a point, from where the
learning curve for the test set starts to increase. The phenomenon when the decrease of the
training error is going on, while the test error starts to increase is called overlearning or
overfilling. In this case the network will memorize the training points more and more while
at the test points the network's response is getting worse, we get a network with poor
generalization. Overlearning can be avoided if training is stopped at the minimum point of
the test learning curve. This is called early stopping and it is an effective way to improve
the generalization of the network even if its size is larger than required.
For cross-validation we need a training set and a test set of known examples. However
there is a question which must be answered: in what ratio, the data points should be divided
into training and testing sets in order to obtain the optimum performance. Using statistical
theory a definite answer can be given to this question [36]. When the number of network
parameters M is large, the best strategy is to use almost all available known examples in the
training set and only
examples in the testing set, e.g., when M = 100, this means that
only 7% of the training data points are to be used in the test set to determine the point for
early stopping. These results were confirmed by large-scale simulations. The results show
that when P > 30M cross-validation is not necessary, because the generalization error
becomes worse by using test data to obtain adequate stopping time. However, for P < 30M,
i.e. the number of the known examples is relatively small compared to the number of
network parameters, overtraining occurs and using cross-validation and early stopping
improves generalization.
Cross-validation can be used not only for finding the optimal stopping point, but to
estimate the generalization error of the network too. In network validation several versions
68
69
parameters, however, may be very prone to overfilling. These general statements are more
or less valid for all modeling approaches, among them for neural networks. MLPs,
however, using backpropagation learning rule have a special feature. They may be biased
towards implementing smooth interpolation between the training points, which means that
they may have rather limited proneness to overfilling.
The effecl of this bias is that even using overly complex neural model, overfilling can
be avoided. Backpropagation can resull in the underutilization of network resources, mainly
at the beginning phase of learning, and this can be definitely observed on the training
curves. As it was shown in Figure 16 overlearning can be avoided using early stopping.
This behavior of MPLs with backpropagation is justified by extensive experimental studies
(e.g., [37]), and by explicit analysis, which shows that neural modeling is often ill
conditioned, the efficienl number of parameters is much less than the nominal number of
the network parameters [38,39].
During learning a network can be forced to reduce the number of efficient parameters
using regularization, as it was discussed in section 7. However, for MLPs with
backpropagation training an implicit regularization, a regularization effect without using an
explicit regularization* term can be observed. The resulted smooth mapping is an
advantageous feature of neural identification as long as the systems to be modeled are
continuous ones. Although this implicit regularization cannot be found in other neural
networks, similar properties can be obtained easily using some form of explicit
regularization, so some inductive bias that is characterized as smooth interpolation between
training points can be found not only in MLPs with backpropagation learning, but in RBF
or even in CMAC nelworks.
4.10. Modeling of a complex industrial process using neural networks: special
difficulties and solutions (case study)
In industry many complex modeling problems can be found where exact or even
approximate theoretical/mathematical relationship between input and output cannot be
formulated. The reasons behind this can be the unsatisfactory knowledge we have about the
basic underlying physical behavior, chemical reactions, etc., or the high complexity of the
input-output relationship. At the same time there is a possibility to collect observations
from the system, we can measure input and output data, so an experimental black box
model based on the observations can be constructed.
In the previous sections of this paper many general questions of black box modeling and
neural networks were discussed. In this section some practical questions will be addressed
through a real-world complex industrial modeling example: modeling of a Linz-Donawitz
(LD) steel converter.
4.10.1 LD steel-making
Steel-making with an LD converter is a complex physico-chemical process where many
parameters have influences on the quality of the resulted steel [40,41]. The complexity of
the whole process and the fact that there are many effects that cannot be taken into
consideration make this task difficult. The main features of the process are the followings: a
large (~150-lon) converter is filled with waste iron (-30 tons), molten pig iron (~ 110 tons)
and many additives, then this fluid compound is blasted through with pure oxygen to
oxidize the unwanted contamination (e.g., silicon, most of the carbon, etc.).
At the end of the oxygen blowing the quality of the steel is tested and its temperature is
measured. If the main quality parameters and the temperature at the end of the steel-making
process are within the acceptable and rather narrow range, the whole process is finished and
the slag and the steel is tapped off for further processing.
70
The quality of the steel is influenced by many parameters, however the amount of
oxygen used during blasting is the main parameter that can be controlled to obtain
predetermined quality steel. From the point of view of steel-making parameters mean the
main features, measurement data of components of the input compounds e.g., mass,
temperature and the quality parameters of the pig iron and the waste iron, the mass and
some quality parameters of all additives, as well as the amount of oxygen used during the
blasting process, etc. It is an important and rather hard task to create a reliable predictor for
determining the necessary amount of oxygen. To give a reliable prediction we have to
know the relation between the input and the output parameters of the steel-making process,
therefore we have to build a model of the steel converter. The inputs of the model are
formed by all available observations can be obtained from a charge. The outputs are the
most important quality parameters of the steel produced, namely its temperature and the
carbon content at the end of the blasting.
To present all details of such a complex modeling task is well beyond the possibilities
of this paper, so the goal of this section is not to go into the details, instead to point out that
besides the basic tasks of system identification mentioned in the previous sections there are
important additional ones which cannot be neglected.
A large part of these additional tasks are related to the database construction.
4.10.2 Data base construction for black box identification
In black box modeling the primary knowledge that can be used for model building is a
collection of input-output data. So the first task of modeling is to build a proper data base.
One serious problem in real-world tasks is that in many cases the number of available data
is limited and rather small.
In steel-making the data base can be built only from measurements and observations
done during the regular everyday operation of the converter. Steel-making is a typical
example where there is no possibility to design special excitation signals and to design
experiments for data collection.
Steel production with an LD-converter is organized in campaigns. During one campaign
the production is contiguous and in one campaign about 3000 charges of steel is produced.
This means that the maximum number of known examples is limited and it cannot be
increased. Moreover, the data base collected in one campaign contains typical and special
cases, where the data of special cases cannot be used for modeling because of technological
reasons. The ratio of special-to-all cases is rather high, it is around 25-30%. The only
possibility to increase the size of the data base is to collect data from more campaigns,
however, from campaign to campaign the physical parameters of the steel converter are
changing significantly and this changing must be followed by the model as well, so one
should take care when and how to use the extended data set.
In forming a proper database the further problems have to be considered:
- the problem of dimensionality,
- the problem of uneven distribution of data,
- the problem of noisy and imprecise data,
- the problem of missing data,
- the effects of the correlation between consecutive data.
The problem of dimensionality if often referred to as the curse of dimension. For neural
modeling we need representative data, which cover the whole input space. This means that
- depending on the dimension of the input space - a rather large number of training and test
patterns is required. If W-dimensional inputs are used and if each input component can take
R different values in their validity range the number of all possible input data samples is
so it grows exponentially with the dimensionality of the input space. This means that
dimension reduction is an important step, especially when the number of training samples
71
cannot be increased arbitrarily. To reduce dimension the following two main approaches
can be used:
- Applying some mathematical data compression algorithms, like independent component
analysis (ICA), principal component analysis (PCA) or factor analysis. The basic
thought behind this approach is that the components of the input data vectors are usually
correlated, so without significantly reducing their "information content" less new
components can be formed from the original ones.
- By analyzing the raw data and using domain knowledge, the rank of importance of the
data components can be estimated and the less important components can be omitted.
In some cases the two approaches can be combined: first - using domain knowledge - we
can select the most important input parameters, then on the selected data mathematical data
compression algorithms can be applied. In the steel-making problem both methods were
considered for reducing the dimension of the observed data, however, the reduction based
on domain knowledge proved to be more useful. Instead of using all recorded data, only
some 20 most important input components of the original -50-component data records
were used during the training.
The importance of the components was determined using detailed analysis of the data
and by the results of some preliminary trained networks. These trained networks were used
to determine the sensitivity of the model output to the input components. It turned out that
there are some components that have very limited effect on the results, so they could be
omitted without significant degradation in the performance of the model. The extensive
discussions with skilled personnel of the steel factory about the role of the input
components have helped us also to select the most important ones.
As a result three major groups were formed. The first group contained measurement
data of clearly high importance, such as mass and temperature values, the waiting time
between the finishing of a charge and the start of the next one (this waiting time has an
effect on the temperature of the converter before filling it with the new workload). The
second group contained clearly negligible data, while the third group contained data of
questionable importance. The third group was tested by building several neural models
based on the same records of the initial data base, but where the input components of the
records were different.
Comparing the performances of the trained networks and analyzing the sensitivity of the
model outputs to the different input components the most relevant ones were selected. After
5-10 experiments we could reduce the input parameters from the starting 50 to about 20.
Another common feature of industrial problems is that the input data are typically not
uniformly distributed over their possible ranges. This means that there may have some
clusters, and within these clusters quite a lot and representing data points are available,
while there may be other parts of the input space from where only a few examples can be
collected. For operating modes from where many data can be collected, appropriate models
can be constructed, while in underrepresented operating modes the available data are not
enough to build proper black box models.
A further problem is that due to the industrial environment the registered data are
frequently inaccurate and unreliable. Some of the parameters are measured values (e.g.,
temperature of pig iron), others are estimated values (e.g., the ratio of the different
components of the waste iron), where the acceptable ranges of the values are quite large. It
is also typical that some measurements are missing from a record. The precision of the
values is rather different even in the case of measured data. If wrong or suspicious data are
found, or in case of missing data there are two possibilities: either the data can be corrected,
or the whole record is cancelled. Correction is preferred, because of the mentioned
dimensionality problem. The large dimensionality and the limited number of data examples
makes it very important to save as many patterns as possible.
72
Initial database
New database
Sensitivity analysis
Input component
cancellation
Input component
of small effect on
the output?
Handling of noisy data is a general problem of black box modeling. The methods
developed for this problem need some additional information (at least some statistical
properties of the measurement noise) and using this additional information a more robust
model can be built. Such method is the Errors In Variable (EIV) approach, but Support
Vector Machines (SVMs) can also take the noise level into consideration.
The Errors In Variables training method was introduced to reduce the negative effects
of measurement noise [42]. The idea behind the method is that knowing some properties of
the additive noise, the training process can be modified to compensate the error effects. In
EIV approach, instead of the standard quadratic criterion function, a new weighted
quadratic criterion function is used, where the weights are the reciprocal values of the
variances of the corresponding measurement noise:
(x(0-x'(Q)
(51)
In this expression (y(0, x(')} i=l, 2, ... , P denote the measured noisy input-output
training examples, x*(i) denote the noiseless and naturally not known inputs (during the
EIV method estimates of these inputs are also determined), a]4 and G2yJ are the variances
of the input and output noise, respectively. The classical LS estimation results in biased
estimates of the model parameters, if the input data are noisy. The most attractive feature of
the EIV approach is, that it can reduce this bias. This property can be proved if it is applied
for training neural networks [43]. The drawback of EIV is its larger computational
complexity and the fact that using EIV criterion function the learning process is very prone
to overfitting. This latter effect, however, can be avoided using early slopping.
Support Vector Machines are also applies a criterion function that can take the
measurement noise into consideration. The criterion function used in SVM is the einsensitive function given by Eq. (52).
*~*
|y-y M |-e
for
y
otherwise
(52)
Using SVMs, the steps of the neural network constructions are rather different from
those of the classical neural network approach. An interesting feature of Support Vector
Machines is thai the size of the model, the model complexity is determined "automatically"
73
74
some information, that the blowing process is greatly different from the standard one, the
goal parameters are rather special which occurs rarely, etc.).
This type of modular architecture consists of such models from which one and only one
is used in a given case. Other modular architectures can also be constructed where different
neural models are cooperating. Instead of using a single neural model an ensemble of
models can be used.
There are heuristic and mathematical motivations that justify the use ensemble of
networks. According to the heuristic explanation combining several different networks can
often improve the performance, however, only if the models implemented by the elements
of an ensemble are different.
The advantage of using an ensemble of neural networks can also be justified by a simple
mathematical analysis [45]. Let us consider the task of modeling a system's mapping f.
.
We assume that we can obtain only noisy samples of this mapping and assume that an
ensemble of T independent neural models is available. We define a modular architecture
using the ensemble of models and the final output of the ensemble is given by a weighted
average as:
7=0
where yHJ is the output of the j-th model. We can define two quality measures, the
ambiguity and the squared error for every members of the ensemble and for the whole
ensemble. The ambiguity of a single member of the ensemble is
(55)
7=0
This quantifies the disagreement among the models on input x. Similarly the quadratic
error of model j and the whole ensemble are defined as follows
y (x)=[y(x)-y M ,(x)f
(56)
e(x) = [y(x)-y(x,(x)p
(57)
and
It can be shown easily that the ensemble quadratic error can be written as:
e(x)=e(x,a)-a(x,a)
T
(58)
T
if a, =1- In Eq. (58) e(x,a)= a,ey(x) is the weighted error and a(x,cc) is the
7=7
7=0
weighted ambiguity of the models as defined by Eq. (55). Eq. (58) shows that the ensemble
quadratic error on x can be expressed as the difference between the weighted error and the
weighted ambiguity. Taking expectations according to the input distribution we can get the
average ensemble generalization error
e = e-a
(59)
where denotes the expected value of e(x) and a the expected value of a(x). This
expression shows that for getting small ensemble generalization error we need accurate and
diverse individual models, i.e. they must be as accurate as possible while they must
disagree.
The weights of the individual networks in the ensemble can be estimated from the
training example too. There are different ways of this estimation: one of the possibilities is
to use a mixture of experts (MOE) architecture [46], where the a} weights as well as the
weights of the neural networks are estimated using a joint training process and where the
results of training are the maximum likelihood estimates of the needed values. The values
of the OJ weights depend on the inputs of the models and they are implemented as outputs
of an auxiliary network called gating network.
76
If Explanation
1
s
AO
1l
Integration
To Ta
To.
NN_ NN ...
1 2
NN.,
K
WV
tOes
Output
Estimator
Expert
System
ft
Correction
Term
Expert
System
it
Input Data
Figure 19: The hybrid-neural modeling system.
The second layer contains the direct modeling devices. It is formed from different
neural models that can work with the data belonging to different clusters. In some cases
such models cannot be used alone, it may happen that they should be used just together
with certain correction terms that modify the result of a neural model. The system makes it
possible to build any other modeling device (e.g., mathematical models or expert systems)
into this layer in addition to the neural models. However, at present neither mathematical
models, nor expert systems can compete with neural ones. So far only such mathematical
models could be formed that gave reliable prediction in a small neighborhood of some
special working points. These models can be used in the validation of the neural models, or
in the explanation generation (see below).
The third or output layer is the decision-maker of the whole modeling system. It has two
main tasks: to validate the results, and to make the final prediction using some direct
information from the first layer. This layer also uses symbolic rules. It validates the result
of the second layer and makes a decision if the result can be accepted at all. This decisionmaking is based on different information: for example, some direct information from the
input layer, or the information obtained from more than one experts of the second layer. As
an example for the first case it may happen that the input data are so special that there is no
valid model for them in the second layer. Although it is a rare situation, this must be
detected by the input expert system and the whole system must be able to give some valid
answer even in such cases. This answer informs the staff that in this special case the whole
system cannot give reliable output, they must determine it using any other (e.g.,
conventional) method. In the second case validation is based on the results of more than
one expert modules of the second layer. Using these results the output expert system will
form the final answer, which may be some combination of the results of more experts or a
corrected value of a given expert. The correction term can be determined using the results
of other expert modules (e.g., other neural networks), or a separated expert system, the role
of which is to determine correction terms directly for the special cases.
A further important task of the output layer is the explanation generation what is also
based on built-in expert knowledge. As neural networks themselves form black-box
models, they cannot generate explanation of the result automatically. However, the
acceptance of such results by an industrial community is rather questionable even if this
result is quite good. The purpose of explanation generation is to increase the acceptance of
the results of the modeling system.
77
4.11. Conclusions
The purpose of this paper was to give an overview about system identification and to show
the important role of neural networks in this field. It was shown that neural networks are
general black box modeling devices, which have many attractive features: they are
universal approximators, they have the capability of adaptation, fault tolerance, robustness,
etc. For system modeling several different static and dynamic neural architectures can be
constructed, so neural architectures are flexible enough for a rather large class of
identification tasks. The construction of neural models - as they are black box architectures
- is mainly based on measurement data observed about the system. This is why one of the
most important parts of black box modeling is the collection of as much relevant data as
possible, which cover the whole operating range of interest. As it was shown in the
example of LD converter modeling, the construction of data base needs to solve many
additional problems; to handle noisy data, missing data, unreliable data, to separate the
whole data base into training set and test set, etc. All these problems need proper
preprocessing, the importance of which cannot be overemphasized.
Moreover, according to the experiences obtained from real-world modeling tasks, prior
information and any additional knowledge to the observation has great importance. Prior
information helps us to select proper model structure, to design excitation signal if it is
possible to use excitation signals at all, to determine the operating range where valid model
should be obtained, etc. An important implication obtained from complex real-world
identification problems is that using only one approach, one paradigm usually cannot
results in satisfactory model. Combining different paradigms, however can join the
advantages of the different approaches, can utilize different representations of knowledge,
and can help to understand the result obtained. This latter is especially important in neural
modeling, because neural models cannot give explanation of the model, and without
explanation, the lack of physical meaning may reduce the acceptance of the black box
models even if their behavior is rather close to that of the system.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
L. Ljung, System Identification - Theory for the User. Prentice-Hall, N.J. 2nd edition, 1999.
J. Schoukens and R. Pintelon, System Identification. A Frequency Domain Approach, IEEE Press,
New York, 2001.
T. Saderstrom and P. Stoica, System Indentification, Prentice Hall, Enhlewood Cliffs, NJ. 1989.
P. Eykhoff, System Identification, Parameter and State Estimation, Wiley, New York, 1974.
A. P. Sage and J. L. Melsa, Estimation Theory with Application to Communications and Control,
McGraw-Hill, New York, 1971.
H. L. Van Trees, Detection Estimation and Modulation Theory, Part I. Wiley, New York, 1968.
G. C. Goodwin and R. L. Payne, Dynamic System Identification, Academic Press, New York, 1977.
K. Hornik, M. Stinchcombe and H. White, Multilayer Feed-forward Networks are Universal
Approximators", Neural Networks Vol. 2. 1989. pp. 359-366.
G. Cybenko, Approximation by Superposition of Sigmoidal Functions, Mathematical Control Signals
Systems, Vol. 2. pp. 303-314, 1989.
K. I. Funahashi, On the Approximate Realization of Continuous Mappings by Neural Networks",
Neural Networks, Vol. 2. No. 3. pp. 1989. 183-192.
M. Leshno, V, Y. Lin, A. Pinkus and S. Schocken, Multilayer Feed-forward Networks With a
Nonpolynomial Activation Function Can Approximate Any Function, Neural Networks, Vol. 6. 1993.
pp. 861-67
J. S. Albus, A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller
(CMAC), Transaction of the ASME, Sep. 1975. pp. 220-227.
Y. H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading, Mass.,
1989, pp. 197-222.
D. F. Specht, Polynomial Neural Networks, Neural Networks, Vol.3. No. 1 pp. 1990. pp. 109118,
J. Park and I. W. Sandberg, Approximation and Radial-Basis-Function Networks, Neural Computation,
Vol 5. No. 2. 1993. pp. 305-316.
S. Haykin, Neural Networks. A comprehensive foundation, Second Edition, Prentice Hall, N. J.1999.
78
[ 17] M. H. Hassoun, Fundamentals of Artificial Neural Networks, MIT Press, Cambridge, MA. 1995.
[18] M. Brown and C.Harris, Neurofuzzy Adaptive Modelling and Control, Prentice Hall, New York, 1994.
[ 19] G. Horvath and T. Szabo, CMAC Neural Network with Improved Generalization Property for System
Modelling, Proc. of the IEEE Instrumenation and Measurement Conference, Anchorage, 2002.
[20] T. Szabo and G. Horvath, CMAC and its Extensions for Efficient System Modelling and Diagnosis,
Intnl. Journal of Applied Mathematics and Computer Science, Vol. 9. No. 3, pp.571598, 1999.
[21] J. Hertz, A. Krogh and R. G. Palmer, Introduction to the Theory of Neural Computation, AddisonWesley Publishing Co. 1991.
[22] K. S. Narendra and K. Pathasarathy, Identification and Control of Dynamical Systems Using Neural
Networks, IEEE Trans. Neural Networks, Vol. 1. 1990. pp.
[23] J. Sjoberg, Q. Zhang, L. Ljung, A. Benveniste, B. Delyon, P.-Y. Glorennec, H. Hjalmarsson and A.
Juditsky: "Non-linear black-box modeling in system identification: a unified overview", Automatica,
31:1691-1724, 1995.
[24] A.N. Tikhonov, V.Y. Arsenin, Solutions of Ill-posed Problems, Washington, DC: W.H. Winston, 1997
[25] E. A. Wan, Temporal Backpropagation for FIR Neural Networks, Proc. of the 1990 LJCNN, Vol. I. pp.
575-580.
[26] D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning Internal Representations by Error
Propagation, in Rumelhart, D.E. - McClelland, J.L. (Eds.) Parallel Distributed Processing:
Explorations in the Microstructure of Cognition, 1. MIT Press, pp. 318-362. 1986.
[27] R. J. Williams and D. Zipser, A Learning Algorithm for Continually Running Fully Recurrent Neural
Networks, Neural Computation, Vol. 1. 1989. pp. 270280.
[28] A. R. Barron, Universal Approximation Bounds for Superposition of Sigmoidal Functions, IEEE
Trans. on Information Theory, Vol. 39. No. 3. 1993. pp. 930945.
[29] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
[30] V.Cherkassky, F.Mulier, Learning from Data, Concepts, Theory and Methods, Wiley, New York, 1998
[31] H. Akaike, Information Theory and an Extension of the Maximum Likelihood Principle, Second Intnl.
Symposium on Information Theory, Akademiai Kiado, Budapest, pp. 267281. 1972.
[32] J. Rissanen, Modelling by Shortest Data Description, Automatica, Vol. 14. pp. 465471, 1978.
[33] N. Murata, S. Yoshizawa and Shun-Ichi Amari, Network Information Criterion - Determining the
Number of Hidden Units for an Artificial Neural Network Model, IEEE Trans. on Neural Networks,
Vol. 5. No. 6. Pp. 865-871.
[34] X. He and H. Asada, A New Method for Identifying Orders of Input-Output Models for Nonlinear
Dynamic Systems, Proc. of the American Control Conference, 1993. San Francisco, CA. USA. pp.
25202523.
[35] M. Stone, Cross-Validatory Choice and Assesment of Statistical Predictions, Journal of Royal
Statistical Society. Se. B. Vol. 36. pp. 111147.
[36] S. Amari, N. Murata, K.-R. Muller, M. Finke and, H. Yang, Asymptotic Statistical Theory of
Ovetraining and Cross-Validation, IEEE Trans. on Neural Networks, Vol. 8. No. 5. pp. 985-998, 1997.
[37] S. Lawrence, C. Lee Giles and Ah Chung Tsoi, What Size Neural Network Gives Optimal
Generalization? Convergence Properties of Backpropagation, Technical Report, UMIACS-TR-96-22
and CS-TR-3617, Institute for Advanced Computer Studies, University of Maryland, 1996. p. 33.
[38] S. Saarinen, B. Bramley and G. Cybenko, Ill-conditioning in Neural Network Training Problems,
SIAM Journal for Scientific and Statistical Computing, 1991.
[39] L. Ljung and J. Sjoberg, A System Identification Perspective on Neural Networks, 1992.
[40] B. Pataki, G. Horvath, Gy. Strausz, and Zs. Talata, Inverse Neural Modeling of a Linz-Donawitz Steel
Converter, e & i Elektrotechnik und Informationstechnik, Vol. 117. No. 1.2000. pp. 13-17.
[41] G. Horvath, B. Pataki and Gy. Strausz, Black box Modeling of a Complex Industrial Process, Proc. of
the 1999 IEEE Conference and Workshop on Engineering of Computer Based Systems, Nashville, TN,
USA. 1999. pp. 6066.
[42] M. Deistler, Linear Dynamic Errors-in-Variables Models, Journal of Applied Probability, Vol. 23. pp.
23-39, 1986.
[43] J. Van Gorp, J. Schoukens and R. Pintelon, Learning Neural Networks with Noisy Inputs Using the
Errors-In-Variables Approach, IEEE Trans. on Neural Networks, Vol. 11. No.2 . pp. 402414. 2000.
[44] G. Horvath, L. Sragner and T. Lacz6, Impoved Model Order Estimation by Combining Errors-inVariables and Lipschitz Methods, a forthcoming paper
[45] P. Sollich and A. Krogh, Learning with Ensembles: How over-fitting can be useful. In Advances in
Neural Information Processing Systems 8. D. S. Touretzky, M. C. Mozer and M. E. Hasselmo, eds,
MIT Press, pp. 190196, 1996.
[46] R. A. Jacobs, M. I. Jordan, S. J. Nowlan and G. E. Hinton, Adaptive Mixture of Local Experts, Neural
Computation Vol. 3. No.l pp. 7987, 1991.
[47] P. Berenyi, G. Horvath, B. Pataki and Gy. Strausz, Hybrid-Neural Modeling of a Complex Industrial
Process, Proc. of the IEEE Instrumentation and Measurement Technology Conference, Vol. III.
pp.14241429. 2001.
Chapter 5
Neural Techniques in Control
Andrzej PACUT
Institute of Control and Computation Engineering, Warsaw University of Technology
Nowowiejska 15/19, 00663 Warsaw, Poland
Abstract Ideas that come to controls from neural networks extend the existing
control methodology beyond the classical standards. We discuss such neural
techniques developed in various branches of classical control. We first introduce some
approximation properties of neural networks important for dynamic systems, and
identification techniques based on dynamic backpropagation, showing on examples
how to calculate gradients in complex dynamic structures. We then discuss
the input-output representations of nonlinear dynamic systems and their neural
approximators. We then demonstrate usefulness of neural networks in well established
control techniques that are used to solve stabilization tasks, tracking problems, and
optimal control problems for nonlinear systems, and in particular, for nonlinear
unknown systems.
5.1.
Neural control
The subject of this chapter lays in the intersection of controls and neural networks. Neural
network methods play here an auxiliary role with respect to the methodology rooted in
control theory. Control theory and practice, developed broadly for linear systems with known
parameters, meet their obstacles when it comes to linear systems with unknown (and possibly
varying) parameters, and nonlinear systems. While very deep and elegant theoretical methods
are developed in these areas, they are not easily transformed to implementations, often due
to only existential results and/or complex relations necessary to be solved. Neural networks
seem to overcome these problems, serving as a general way to approximate various nonlinear
static and dynamic relations. Neural networks are thus being builded into the control systems
as approximators; such systems may be called the neural control systems. We yet stress
that this expressions may not be just: the overall methodology, structure, and inside sense
of such neural control systems comes from control theory and the neural networks are only
supplemental.
Applications of neural networks as elements of control systems bear new controltheoretical problems, related to the influence of local approximation errors to the global
performance of such systems. While, initially, only simulations supported the ideas, presently
more and more theoretical results prove the soundness of neural approximations in control
systems. The initial reservation among control practitioners, caused by no theoretical
80
performance guarantees for neural control systems, may now be overcome and the control
practice may reach for new territories.
In this chapter we would like to show how neural network can extend the control
methodology beyond the standard areas. We hope to interest control people with various
ideas coming from neural networks that have been applied in control. It is also directed to
control practitioners who may want to extent their tools. It may be also useful to neural
network specialists, to show neural network needs of control theory. To facilitate reading, we
also outline some control background beyond the linear control that may be less known to
non-specialists.
This chapter by no means presents all the neural network methods that have been applied
with success in controls; we certainly missed many interesting and important ideas. Number
of paper in this area goes into several hundreds a year. The presented topics surely reflect
personal interests of this author, which may not always be related to the importance of
the material. We also decided to restrict the discussion to discrete-time continuous-values
problems. We yet hope to show the wealth of new ideas that come to controls from neural
networks.
Control methods that employ neural networks are as old as the neural network themselves.
In fact, at the very early stage of neural networks development, some well known adaptive
control techniques were relabeled as neural network techniques. Thanks to the intensive
research over many years, both in control and in neural networks, some ideas proved their
usefulness and were supported and refined theoretically, while some other were shown to
be too optimistic and sometimes without merits. At present, it is clear that the property
of neural networks employed in control is their ability to approximate arbitrary function
with arbitrary accuracy. Probably as important is the existence of simple methods to
calculate necessary gradients in control and identification algorithms, related to the gradient
backpropagation. Finally, very important yet less developed, are control structures influenced
by biological control structures. Contemporary neural control structures typically use well
proven control techniques with neural networks used as approximators of structural elements
that are unknown or to complex to be used without approximations. Neural approximation
of structural elements of control schemes are also used even if the exact solutions are known,
but a simple though approximate solution is preferred. Consequently, all classical branches
of control developed "neural techniques". There also exist new control schemes that were
developed under a direct influence of neural modeling, with control schemes shaped by the
existing biological control systems. At present, neural control area is still evolving, proving
or disproving the usefulness of the plethora of existing techniques.
Notation. We use standard mathematical notation, so we introduce here only some specific
matters that may cause misunderstanding. All vectors are assumed to be column. We use
calligraphic letters to denote sets (with some exceptions like R that stands for real numbers),
bold letters for matrices and vectors, and italic letters for their elements and other scalars.
Time variable is used in parentheses, and lower indexes are used to denote elements or vectors
or matrices. We denote the time delay operator by q-1, and the n-step delay by q-n. We
introduce the tapped-delay line operator q-n, namely a column vector that consists of the
identity and n - 1 consecutive delays, i.e.
q-nx(k) = [x(k) x(k1) x(k - n + 1 )]r
n
(1)
81
A = OXifi - OXfr
n
(2>
(3)
By 0 we denote a zero vector, and by O a zero-element matrix, regardless of their dimensions
(always clear form the context).
In figures, we use single lines for scalar signals, and double lines for vector signals.
A double line may split into single lines (usually at the input to a functional box) to illustrate
the behavior of signal components, and reversely, single lines can be grouped into a double
line (usually at the output from a functional box) to form a vector signal. This enables for
"zooming" into a transformation of components of a vector signal.
Roadmap. In Sec. 5.2 we first shortly introduce neural networks with the assumption that
the reader is familiar with this topic. We deal only with feedforward networks, and construct
dynamical neural systems only through delays in the signal transmissions. We typically do
not discuss the internal structure of neural networks but rather treat the network as a whole.
In the next subsection we introduce the basic nonlinear systems used in this Chapter. We
then discuss approximation abilities of networks, important for dynamic systems, and point
out a relation between neural approximations and the curse of dimensionality, extremely
important in many control problems.
In Sec. 5.3 we discuss the difference between the chain rule and backpropagation, and
show a simple way to derive the backpropagation formulas. Gradient backpropagation
can surely be derived from the chain rule, but this method the very basic simplicity of
backpropagation. Gradient calculation in complex systems is very substantial in neural
approximations used in control systems. We also give examples of application of the
presented method.
Section 5.4 is devoted to models of dynamical systems used in control. The very
basic problem here is the input-output representation of dynamical systems and the ability
of neural networks to approximate such dynamic representations. We discuss local
NARX representations, global representations, affine in control approximations, disturbance
modeling, and the notion of relative degree.
Next four sections discuss particular neural techniques in controls. We had a problem
with a categorization of the cases discussed (by a type of control task, by a control technique
used, by neural methodology involved). Even if a categorization is decided, there are always
elements of another categorization that are important enough to be discussed separately.
We decided to discuss separately stabilization, tracking, optimal control, and reinforcement
control. In Sec. 5.5 we discuss the use of neural methods in stabilization, in particular in
feedback linearization, Lyapunov method, and in a dead-beat controller method. In Sec.
5.6 we discuss applications of neural networks to tracking, in particular in model reference
control, internal model control, general tracking control, and linearization methods. Next
section 5.7 is devoted to "neural" optimal control, and in particular to finite horizon problems,
predictive control, and dual control. Finally, in Sec. 5.8, we discuss reinforcement control,
namely the heuristic dynamic programming with backpropagated critic and dual heuristic
programming.
82
5.2.
Neural approximations
ueRq,
Rp
ye
(K)
(4)
( 2 ) (1)
whose internal structure has a layer form, namely N = n n n such that each layer's
output is the next layer's input. Consequently, only the last K-th layer sends its output to
the outside world (the output layer), and the remaining layers' outputs are internal signals
(the hidden layers). Moreover, we always assume that each layer can be presented as a given
function y of an affine transformation of its input, namely
(5)
where W is called the weight matrix, b is the bias vector, and yi are the activation functions
(we did not indexed the layer, its input and output for better readability). Due to this special
form of y, the transformation in each layer can be separated into elements y i (wu + bi)
called the neurons, where wi are neurons' weights. We finally add that the vector of biases
is customarily treated as the "zero" column of the weight matrix, with the simultaneous
extension of the input vector with 'zero'-th element equal to one. While this introduction
is quite concise, the reader is referred to the earlier chapters for more thorough exposition.
By building-in such static networks into dynamic systems we will create dynamic networks.
A special type of static network that we use most has only two layers: the linear output
layer of weights V and and the (only) hidden layer of weights W and biases b, whose
activation functions y are identical (Fig. 1). The family N[y] of such networks
N[y] = {N : N(u) = Vy(Wu + b)}
(6)
is the basic function approximation tool. Networks of this class will be typically used in
control systems discussed in this chapter.
5.2.2. Nonlinear systems
Our main object of interest are nonlinear time-invariant multiple-input multiple-output
(MIMO) deterministic plants S = (f, h) of the form (Fig. 2)
y(k) = h(x(k))
where x(k)eRn is the state vector, u(k)Rq denotes the plant input, and y(k)R p is the plant
output. We typically assume that the origin is a stationary point, i.e. f(0,0) = 0, h(0) = 0.
For convenience we often take p = q, and often specialize to single-input single-output
(SISO) plants, i.e. to p = 1. In the context of dynamical systems, neural networks are
typically used as filters that may undergo a continual training with examples presented in
ordered way, as opposed to traditional network training with a finite number of examples
presented repetitively in an arbitrary order. One may differentiate between networks' use
as adaptive filters that undergo a continual training with examples continuously fed to the
Figure 1: Zooming into a one-hidden layer network of N[y] class. The network as a function (top); layer
structure shown (middle); neurons shown (bottom).
filter and forming a possible infinite sequence, or their use as non-adaptive filters, when
the training stops at some point and the filter works non-adaptively afterwards. Recurrent
networks are treated in control as dynamic systems fed with data, rather than as traditional
associated memories.
5.2.3. Approximation problem
Approximation abilities of neural networks are discussed elsewhere in this book, and here
we only stress certain properties important for dynamic systems. We say that a family of
functions has the universal approximation property (UAP) for a class of functions if for
any function f e F and any desired accuracy e there exists a function / such that
d (f, f) <
(8)
where d is a chosen distance between functions. It is known that = N[y] has the
UAP for various classes of functions provided the activation function y satisfies certain
conditions [11, 14, 20, 18, 48, 19]. Probably the strongest results in this area have been
obtained by Leshno, Lin, Pinkus, and Schocken [32] who proved that N[y] has the UAP for
continuous functions over any compact set u, for the distance d(f, f ) = supueu |f(u) - f(u)|,
provided y is not a polynomial. Similarly, N(y) has the UAP for functions integrable with
k-th power, with d(f, f) = (j^ |f(u) - f(u)| p du)1/p . The distance can be also extended
84
- \'/p
to d(f, f) = (&|f f|p) , where 6 is the expected value and / are random variables.
The condition of being non-polynomial is certainly satisfied by continuous sigmoids, i.e.
functions y for which y(u) = 0, y(u) = 1 , including the most popular logistic sigmoid
u oo
i/ oo
activation function
y(z) = -- -,- r
1 + exp(-fl z)
(9)
It is important in many control applications that the approximation is performed for a function
together with its derivatives. For any nonnegative integer-value vector k = [k1, . . . ,kq]T,
denote by Dkf the derivative
(10)
'
max, |uj|oo
(11)
ue^
(12)
takes into account discrepancies between the function and its model, as well as between the
function and the model derivatives up to a certain order. It is proven by Homik, Stinchcombe,
and White [21] that for any rapidly decreasing function /, any compact set u, and the
approximation accuracy (12), the networks N[y] have the UAP provided y is l-finite, i.e.
it is l-times continuously differentiable, and 0 < y(C)(z) dz < oo. The logistic sigmoid
function (9) is l-finite for any l > 0, and Gaussian activation functions used in RBF networks
are l-finite for any l > 0. Note that polynomials and sinusoidal functions are not l-finite for
any l.
5.2.4. Approximation of sequences
Control applications require that, to be able to approximate (discrete) dynamic systems,
the approximation theorems are extended from function approximation to approximation of
sequences of functions. A discrete dynamic system has approximately finite-memory if for
an arbitrary E there exists a window of integer length T > 0 such that for all t, and all inputs
u = (u(k), k > 0}, one has
|y(k)-y t ,r(t)|<*
(13)
where {y(k), k > 0} is the output to u, and {yt,T(k), k > 0} is the output to the windowed input
{ut,T(k), k > 0} defined as
(14)
It is proved by Sandberg [51] that for causal time invariant approximately finite-memory
single-output systems, the approximating networks N[y] have UAP, namely for an arbitrary
e
*eR
(15)
85
uniformly for all inputs u. The network input is equal to q -T u(t), i.e. it consists of the
current and delayed system inputs. Sandberg's method can be used to generalize function
approximation results to discrete dynamic systems.
5.2.5.
No curse of dimensionality ?
The famous result of Barron [3] gives an upper bound on the size of the hidden layer
that does not depend on the dimension of the input space, thus showing lack of the
"curse of dimensionality" for neural approximators. More precisely, consider one-hidden
layer networks N[y] where y is a bounded continuous sigmoid. Suppose that a function
f : Rq H-> R is to be approximated for ||u|| < r. Define the approximation error d(f, f) by
d(f,f) = (6|f -f|2)1/2
(16)
assuming that the distribution of u is concentrated over ||u|| < r i.e. P{||u|| < r} = 1. Denote
by f(u) = Jexp(j wTu) f (u) du, w e Rq, the (q-dimensional) Fourier transform of /. The
integral
r,
I|2 i ft
\\oj\\*
\f(a>)\da>
(17)
can be regarded as a function complexity index. Barron's theorem [3] states that if the
complexity index C/ of the approximated function / is finite then there exists a N[y] network
with n hidden neurons such that the approximation error is bounded by
2rCf
s<-JVn
(18)
In other words, for an approximation error bounded by e0, the number of hidden neurons
4r2C2f
n < ^
(19)
does not depend on the input dimension q. This result shows a computational advantage of
neural networks over other approximations like polynomial approximations, splines, etc.,
for which the required number of parameters grows exponentially with the input space
dimension. Barron's result does not yet solves fully the dimensionality issue for neural
networks, since the bound (19) depends on the function complexity index C f that may depend
on q. This problem, extremely important for control applications, is still under intensive
research.
5.3.
Gradient algebra
Gradient backpropagation is probably the buzzword of neural networks. It has in fact two
different meanings: it is (A) a method of gradient calculation, as introduced by Werbos [59,
62], and - even more commonly - (B) a gradient method of network's weights adjustment,
with gradients calculated with the use of backpropagation (meaning A) [50]. Here we will
discuss the backpropagation in its first meaning.
5.3.1. Layered systems of functions
To show the difference between the chain rule and backpropagation, consider first a composite
function of a single variable x0 presented in the form of a layered family of functions, namely
86
(20)
with the appropriate relations between function domains, whose typical example is the
dfk
dx
multilayer perceptron. Denote f'k = L, for / = 1, . . . ,, and jcj^ = k for XQ, ...xe-i
fixed, l < k. The derivative
(21)
XQ
can be calculated recursively in many ways, the most important being the chain rule algorithm
and the backpropagation algorithm. Let us first apply a "right-to-left grouping" of terms in
(21), namely
(22)
'
4K> = A'(**-i)*U>
** = /*(*)
(23)
*=l,...,n
and is commonly termed the chain rule and also may be called the forward-propagation. If a
"left-to-right grouping" of terms is applied to (21), namely
(24)
x
*l
we obtain the backpropagation algorithm
(25)
In the above formulas we omitted, identical for both algorithms, calculations of values of
the variables. An apparent difference between the two algorithms consists in the order
of calculations. More important though are intermediate derivatives calculated by both
algorithms. In the chain rule, we calculate derivatives of intermediate variables xi, with respect
to the same independent variable x0, to end up with the derivative of xn. On the other hand,
the backpropagation formula consists in calculating derivatives of the same variable xn with
respect to intermediate variables xj to end up with the derivative with respect to x0.
The above observations enable to formulate the basic generic principles behind the chain
rule and the backpropagation. For any two variables u = xk, z = xl, l < k, the chain rule can
87
be compactly written as
dz_ _ _df_z
du
du
dx
dx du
(26)
where by fz = fl we denoted the function that defines z, and the sum extends over all variables
x that directly influence z through fz (i.e., the arguments of fz), Fig. 3. On the other hand, for
the same two variables u = xk, z = xl, l < k, the backpropagation formula has the form
du
du
dx du
(27)
where fx denotes the function that defines x, and the summation extends over all terms x
that are directly influenced by u (i.e., all functions whose one of the arguments is u), Fig. 4.
The above two formulas show the essence of the differences between the two algorithms.
While for the chain rule one needs derivatives with respect to all variables that influence
a given intermediate variable, the backpropagation calls for derivatives of all variables that
are influenced by the present variable. Knowing this, derivation of the gradient for even
complicated neural networks is almost trivial. In the matrix form, both methods differ in the
order of matrix multiplication [44].
5.3.2. Gradient calculations in nonlinear dynamic systems
Suppose that for a dynamic plant (7) both f and g are approximated by neural networks, f and
h, respectively, namely
; w)
= h(x(k); v)
(28)
where w and v denote the weight vectors of both networks. Assume that the networks must
minimize the cost J = *Li ||y(&) - y(&)|| of the discrepancy between the model output y
and the desired output y. We show how the chain rule and the backpropagation work for
this systems. To avoid cluttering of formulas we often skip the intermediate arguments of
functions, e.g., we write f(k) instead of f(x(k), u(k); w).
We first calculate the derivative (Fig. 5), where v is any element of v. Since v
dv
influences all output coordinates at every moment k, we have
dJ
dhj(k)
dv
(29)
(*)
*(*)
y(*)
/7
r^rtx^n h g>
AV
where N is the number of data points. In turn, every output coordinate y(k) directly influences
the cost index, hence
dj
(30)
'" y(*)
<<
(31)
dv
k=1 j=1
The derivatives ^ can themselves be calculated with the use of backpropagation once the
dv
~
dJ
auuwiuic
ui the
uic h
11network
iiciwuiK.islaknown.
fkjujwn. To
i\j calculate
coi^uiaic , ,where
wucitw>visisany
aiiyelement
ci&iiiciuof\jiwTT
116),
structure of
(Fig.
dw
rfw
we notice first that w influences all state coordinates at every moment k. Consequently
dj
=y
d//*)
(32)
dw
where the partial derivative can itself be calculated with the use of backpropagation once the
structure of the network f is known. Since every coordinate xi(k) of the state vector influences
every coordinate of the present output vector yj(k) through the observation equation, and
every coordinate of the state at the next moment (except at the last moment) through the state
equation, we have
dJ
dJ
;_!
dJ
=^
* J^- '
dJ
dJ
dhj(k)
*\ '
dhj(N)
j 1
dfj(k)
for k < N
dxt(k)
(33)
where the partial derivatives again can be calculated by backpropagating through the network
once the structures of the networks f and h are known. Finally, every output coordinate
directly influences the cost index, hence
(34)
89
dJ
- I dj
we may rewrite the
dxn(K)
[</*,(*)
backpropagation equations for both networks in a form of linear recurrent equations with
time reversed, namely
Introducing the sensitivity gradient
+ hx
Jw(k) = Jw(k+ 1) +fw(k)TJx(k)
T
(35)
0
(36)
dJ_
dw
dyj(k)
(37)
Since any output depends only on all the state variables at the same moment, we obtain
Y^I dhi(k) dxi(k)
y
4-* dxi(k) dw
(38)
Finally, since the state depends on the weights directly and through the previous value of all
state variables, we have
dxf(k)
dw
y
dw
dftf)
4^~\ dxj (k I )
dxj(k-l)
dw
(39)
90
5.4.
where F, G, and H are matrices of appropriate dimensions. The first question we shortly
discuss is whether the plant is observable, i.e. its state can be recovered from a finite number
of input and output values. Since the present and future outputs y(k + i), i > 0, are linear
combinations of future inputs u(k + i), i > 0, and the present state x(k), namely
= HF x(k) + HG u(k)
(41)
'
>-!
n-1
j-1
hence the state x(k) can be recovered from a finite number of future inputs and outputs once
the observability matrix W0 defined as
H
HF
Rnpn
(42)
has full column rank, i.e. r(W0) = n. Moreover, the state x(k) depends linearly on the future
inputs and outputs in (41), namely
x(k) = 0(q-ny(k + n - 1), q-n-1 u(k + n - 2))
(43)
where ^ is a linear function resulting from solving (41). Observability enables to eliminate
the state from the plant equations (40). Indeed, replacing the state with (43) in the formula
n
(44)
91
one may present the output as a linear combination of delayed outputs and inputs, namely
y(k + n) = <ff(q - n y(k + n-1), q -n u(k +
1))
(45)
where tfr is a linear function. Consequently, every linear observable plant admits the ARX
representation of order at most n, namely
(46)
i=1
i=1
q-n y(k) =
(47)
u(k-n+1)
(48)
(49)
For SISO systems, A and B become vectors a and b, and we can rewrite the ARX
representation (48) as (Fig. 9)
(51)
ARX representation for linearized systems. ARX representation can be specified locally
also for nonlinear plants by the linearization around the origin. Consider the nonlinear
Figure 9: ARX representation for linear observable plants. Figure 10: NARX representation for nonlinear
locally observable plants.
Note that the
multiplications by constants and summation in the
ARX representation (Fig. 9) have been replaced
by a nonlinear function.
92
(k))
and to admit the plant linearization around the origin assume that f and h are continuously
twice differentiable and have stationary points at the origin, i.e. f(0,0) = 0, h(0) = 0. The
linearized plant SL = (F, G, H) has a form (40) where
. arcx,u)
(0,0)
arcx.ii),
t/
"
1(0,0)
ahcx),
(53)
* 10
Practically, such linearized representation is valid only close to the origin and certainly does
not carry nonlinear properties of the plant.
NARX representation. Consider now the nonlinear plant (7). Similarly to the linear case,
one may express the present and future outputs y(k + i), i > 0, as functions of future inputs
u(k + i), i > 0, and the present state x(k), namely
= h(f(x(k), u(k)))
2) = h(f(x(k+ 1), u(k+ 1))) = h(2)(x(k), u(k), u(k+ 1))
(54)
(55)
hence, as in the linear case, the output can be expressed by the delayed inputs and outputs
y(k + n) = #(q-ny(k + n- 1), q-nu(k + n- 1))
(56)
but, unlike in the linear case (43,45), neither 0 nor ^ are linear. Consequently, locally around
the origin, the nonlinear system admits the NARX (for: Nonlinear ARX) representation,
namely, Fig. 10
y(k+ 1) = iK(q-n(k), q-nu(k))
(57)
The nonlinear function can be approximated by a neural network ^ (Fig. 11), namely
y(k+ 1) = ?(q-ny(k), q-nu(k); w)
(58)
where w denotes the weight vector of the network. Note that while *fi can approximate ^ in
an arbitrary region, iff approximates the nonlinear object only locally around the origin, hence
the neural approximation remains local.
5.4.2. Global representations
Global representations of nonlinear dynamic systems are, in general, unknown. It was proven
by Aeyels [2] for autonomous observable objects that 2n+l values of the output are sufficient
to recover the state. Levin and Narendra [34] proved that if the plant state and observation
functions f , g are smooth and the state function f is invertible with respect to the state x(k)
then again 2n+l past values of inputs and outputs are sufficient to recover the state. Practically,
93
Figure 11: NARX representation, with nonlinear function ^ modeled by a neural network ^. Note that an
arbitrary nonlinear function (ff (Fig. 10) has been replaced by a linearly transformed activation function of an
affinely transformed argument.
the state invertibility assumption is not stringent for discrete-time plants, since sampling of
continuous-time models leads to state invertible discrete-time models.
If the order n of the plant is unknown, one must anyway consider NARX models of
sufficiently high orders m.
5.4.3. Affine-in-control representations
Several useful approximate representations can be derived from NARX; we discuss only the
SISO objects to simplify the notation. By Taylor expansion of ty around (q-ny(k), 0) one
obtains an approximate representation affine in the present and past inputs, namely
y(k+ 1) = ^o(q-ny(k)) +
^(q-ny(k)) u(k i+
1)
(59)
1=1
which can be realized as a scalar product of the delayed plant outputs extended with
a constant, and the output of a n + 1 -output neural network ^r, namely (Fig. 12)
y(k+ 1) = tf(q-ny(k); w)
(60)
[q-nu(k)]
where w denotes the weights of the network, and denotes scalar multiplication of vectors,
namely a b = aTb.
Similarly, by Taylor expansion around (q-ny(k),0,q-n+1u(k- 1)), one obtains another
approximate representation, affine in the present input yet nonlinear in past inputs, namely
(q-ny(k),q-n+1u(k-1))u(k)
(61)
which can be realized with a 2-output neural network ^ (Fig. 13), namely
c-l); w)
1
u(k)
(62)
(63)
94
^! I
^
1(0.0,0)
Consequently, linearization of the entire plant S = (Sx; Sv) leads to SL = (F, G, H) where
Assume that both Sx and Sv are observable. It can be proven [37] that the entire linearized
plant SL is observable if all eigenvalues of the linearized noise dynamics are different
from the zeros of the noise-output transfer function, and then SL. admits representation
ARX(n + s, n + s). Consequently, by the same argument as before, the nonlinear object
with unmodeled dynamics admits locally representation NARX(n + s). Since practically
little is known about the unmodeled dynamics, the necessary order of NARX model must be
increased until the required accuracy has been achieved.
Various modifications of NARX structures are possible if the disturbances are modeled
by stochastic processes [35]. Since mostly deterministic control aspects are discussed in this
chapter, we do not discuss this subject due to lack of space.
5.4.5. Relative degree and alternative NARX models
Recall that for the linear SISO system L(F, G, H) (40) we have (44)
M
j u(k
+ n-j)
(66)
where Mj = HFj-1G e R. The relative degree rd(L) is defined as the delay in the input-output
transmission, namely the d that satisfies
M1 = . . . = Md-1 = 0 and Md * 0
95
If rd (L) = d then
y(k + d) = HFd x(k) + HFd-1G u(k)
This allows for the following ARX predictor representation of the linear system
y(k + d) =
i ajy(k - j+ 1) +
7=1
; (* - ;+ 1)
(67)
7=1
vt(x, w) =
where we denoted f(1)(x) = f(x, 0), and f(k)(x) = f (ff (k-1) (x), 0). The local relative degree
(LRD) is equal to d
lrd(S) = d
(68)
(69)
vk(x, u) = 0,
(70)
In all other cases we say that the local relative degree is not well defined. In other words,
lrd(S) is not well defined if for some k and some D 3 (0, 0)
vk(0,0) = 0 vk(x, u) 0 for (x, ))- (0,0)
(71)
Since for linear systems vk(0, 0) = Mk then if the local relative degree is well defined, then
lrd(S) = rd(SL). Consequently, for SISO systems with well-defined local relative degree d,
one can employ a predictor NARX representation in the form (Fig. 14)
y(k + d) = (q -n y(k), q -n u(k))
(12)
Figure 14: NARX model in the predictor form. The structure is identical to the one shown in Fig. 10, except
that the single delay has been replaced by the d-unit delay.
96
5.5.
Stabilization
The basic problem of regulation consists in stabilization of the plant around a fixed operating
point. Stabilization is also a first step in various other control tasks. While the problem
is solved theoretically for linear plants, nonlinear plants lack constructive solutions. We
first recall some basic issues related to stabilization, and then present several approaches to
the stabilization task for nonlinear plants that use neural approximators. The first method
employs the feedback linearization which makes it possible to employ linear methods to
stabilize a nonlinear system. The second approach also employs the linearization principle,
but instead of finding the linearizing transformation, a nonlinear feedback law is designed to
approximate the Lyapunov function. The last presented approach employs the controllability
properties of nonlinear systems to form a nonlinear dead-beat controller that stabilize the
nonlinear system in a finite number of steps. Yet another approach to nonlinear system
stabilization is presented in [67].
While all the discussed approaches were theoretically known before, they became
practically implementable due to neural network approximations.
5.5.1. Preliminaries
Controllability. A plant is called controllable in C if every initial state in C can be
transformed to any final state in C in a finite number of steps.
For linear time-invariant controllable systems (40) this transformation can be done in n
steps. It is easy to verify the following equation relating the state with past inputs and states
in linear systems (40)
n
n-/)
(73)
hence the linear plant is controllable if and only if the controllability matrix
Wf = [G FG Fn-1G]
(74)
has full row rank, i.e. rank(Wc) = n. This also shows that any two states can be transformed
one into the other in n steps.
For nonlinear systems the notion of local controllability is useful (see, e.g. [54, 33]),
where we require that for every neighborhood 'V of the origin there exists a neighborhood TV
of the origin such that every initial state in TV can be transformed to any final state in TV in
a finite number of steps without leaving 'V.
Stability. Consider now an equilibrium xe: f(xe,0) = 0 and assume that xe = 0 (If the
equilibrium is not at the origin, the coordinates can always be shifted to the equilibrium).
The equilibrium is stable if for any neighborhood of the origin 'V there exists a neighborhood
of the origin TV such that if x0 TV then x(k) e 'V for all k > 0, and asymptotically stable if,
additionally, lim x(k) = 0. The equilibrium is finite-stable if the limit is achieved in a finite
koo
number of steps. If TV consists of the whole state space, the origin is stable globally.
If f is Lipschitz continuous in a neighborhood of the equilibrium, and the system is
asymptotically stable, then the stability property is valid also for systems with bounded
disturbances
x(k+l) = f(x(k)) + v(k)
(75)
More precisely, the equilibrium is stable under perturbances if for any neighborhood of the
origin 'V there exist a neighborhood of the origin TV such that if x0 e W and v(k) W for
97
(76)
for all x e 'W. If there exists a Lyapunov function then the equilibrium is stable. If,
additionally, -AV is positive definite, the origin is asymptotically stable.
Stabilizability. If there exists a feedback function g such that the equilibrium point of the
closed loop system is asymptotically stable then the system is said to be stabilizable (around
the equilibrium). It is known (see, e.g. [54]) that if a linear time-invariant plant is controllable
then it is stabilizable by a linear state feedbacku(k) = Kx(k) or a linear output feedback
u(k) = Ky(k). In fact, the eigenvalues of the state transition matrix F + GK , or F + GKH,
must just lie inside the unit circle. Since this matrix cam always be made nilpotent, the
closed-loop system can be lead to the origin in a finite number (n) steps (dead beat control).
This property has its local extension to nonlinear systems. Namely, if the linearized
system is controllable then the nonlinear system is locally controllable and there exists a
linear state feedback that makes the closed loop system locally asymptotically stable (see, e.g.
[54]). Moreover, there exists a neighborhood of the origin such that a continuous feedback
u(k) = g(x(k))
(77)
moves the state to the origin in at most n steps. It is also proven [33] that if C is the set
controllable to the origin, then there exists a control law that makes C finitely stable with
respect to the origin; the control law in this case is yet not necessarily continuous.
5.5.2. Stabilization through feedback linearization
A nonlinear plant (7) is feedback linearizable if there exists a transformation (#,u) of the
state and input to a new state x and input 0, namely
x = 0(x)
u = u(x, u)
(78)
with 0 invertible and continuously differentiable, such that the transformed system is linear.
If such the transformation exists only in a neighborhood of x = 0, u = 0, the system is locally
feedback linearizable at the origin. While the conditions for existence of such transformations
are well known [27, 31], they are difficult to verify and not constructive. Levin and Narendra
[33] propose to use neural models of 0 and p.. The networks 0, u
x = 0(x; v)
1
u = u(x, u; w)
(79)
where w, v denote the weight vectors, are trained to make the output of the transformed
system follow a desired linear system
(80)
I]
* ~ 0 Or
Go_fO
(81)
98
(82)
x,ueD
then the model f is approximately feedback linearizable, namely the difference between the
outputs of the transformed model and the desired linear system is arbitrarily small uniformly
in D
sup ||(F0 x + G0 u) - #(f(x, u))|| < 62
(83)
with x and u given by (78). Consequently, the origin is stable under perturbations, and the
neural model will converge in n steps to a ball 8 of arbitrarily small radius and centered
at the origin, provided ei and e^ are sufficiently small. The local feedback linearizability
of the unknown plant can yet be verified only indirectly: without this property the learning
procedure is not convergent.
The resulting system is shown in Fig. 15. Let the training error be given by
J =\ ZjLi l|e(fc)ll2 where e(k) = x(k)-x0(k) denotes the difference between the actual output
x and the desired output x. We briefly discuss the way the necessary gradients are calculated,
and to avoid cluttering of formulas we skip the symbols of all inner functions, for instance
we wnte for
OXj
OXj(k)
Calculation of the cost derivative with respect to any weight v of the 0 network is very
simple, namely
obtain
dxi(k)
dJ
dJ
y dJ
^-J dxj(k)
Y^<
dJ
^L ~~- V V
k
dfj(k)
dxi
dJ
dfj(k)
dxt
y
dJ
cfij(k)
^ duj(k) ~dx~
dfj(k)
dj
J
d\v
'
W (k)
= J W (k+1) + uW(k)Tfu(k)T
^
(86)
where
(87)
),
with the initial conditions Jw(N+l) = 0, Jx(N+1) = 0. The desired gradient is accumulated
dvf
dJ
_
in JW(k) so that = J w (1). The matrix //w can be calculated by (static) backpropagation.
aw
If the nonlinear system is not known then it may be identified off-line by another neural
network.
The fact that a linear feedback is designed here for the linearized system makes this
approach valid only in a close vicinity of the equilibrium. Other neural approaches to
feedback linearization are presented in [16].
5.5.3. Stabilization through Lyapunov function adjustment
Consider a nonlinear plant S whose linearized version SL. = (F, G, H) is controllable. It can
thus be stabilized by a linear static controller u = -Ky. By the linearization principle (Sontag
[54, p. 170]), the one-step increment of the Lyapunov function V = xTPx for the nonlinear
system can approximate function -xT Qx in a certain neighborhood C of the origin for any
desired positive-definite matrix Q.
These properties were employed in the construction of a locally stabilizing feedback for
known nonlinear plants, proposed by Yu and Annaswamy [67]. The method consists of
setting up a neural model of the controller for which the Lyapunov function increment is
smaller than a desired increment (Fig. 16). One first selects a positive-definite matrix Q and
a feedback matrix K that makes F - G K H asymptotically stable. The Lyapunov function
corresponding to the selected Q can be obtained by solving for a positive-definite P the
discrete-time Lyapunov equation for the closed-loop system
( F - G K H ) T P ( F - G K H ) - P = -Q
(88)
n
The training data x are generated uniformly from a certain region C e R which can be
100
enlarged in training. For each data point x, one calculates the desired increment AV0 of the
Lyapunov function by
AV0(x) = -xTQx
(89)
Now, for the same data point one uses the network approximation g of the feedback law to
calculate the next state x+, namely
y = h(x),
u=g(y;w),
x+ = f(x,u)
(90)
(91)
Since the goal is to obtain the Lyapunov function increment AV not greater than the desired
one AV0, the cost function will take into account only those x for which e(x) = AV - AV0 is
positive, namely
(92)
where 3. - {x : e(x) > 0}.
Gradient backpropagation for any weight w of the g network has thus a form
dJ_
d\v
-f dui dw
j=\
dJ df
dJ
(93)
(94)
where gw and fu are Jacobian matrices. As usual, any gradient method can be used for
minimization of J. It is proven in [67] that the resulting closed loop system is asymptotically
stable in some open neighborhood of the origin.
5.5.4. Dead-beat controller
Feedback linearization can be applied only to a class of nonlinear systems. A direct
stabilization method working in a more general case has been proposed by Levine and
j=^>
y
A
=> w g
,
7 '
f=> f
AF(-)
If^r??
II1
AK ()
J ()
with the equilibrium at the origin and bounded continuously differentiable Lipschitz state
transformation function f. The plant is assumed to be known, otherwise the design procedure
must start from setting up a neural model of the plant. The method consists in training a neural
dead-beat controller g(x; w) that drives the overall system to the origin in n steps, Fig. 17, if
the initial state X0 belongs to an origin-centered ball Bp of radius p > 0. The error function
takes into account the distance of the state from the origin after n steps, namely
J =
otherwise
(96)
where , initially close to 1, controls a region in which the n-step mapping realized by the
system is a contraction mapping. Decrease of A may speed up the controller.
The system must be run multiple times, with the initial conditions uniform in Bp.
Parameter may be decreased if learning is not convergent, and increased to make the control
time shorter. Controller parameters can be tuned with any gradient method, with the gradient
calculated by the error backpropagation.
5.6.
Tracking
5. 6. 1. Preliminaries
Suppose a SISO plant output is to follow the reference signal ry(k) which is the output of the
reference model SR. We are to find u(k) such that the state x(k) of the closed-loop system is
bounded for all k and
for (x(0), ry(0)) in some neighborhood of the origin. For linear controllable plants, if the
reference model LR is linear, observable, and has simple eigenvalues, then
solves the output tracking problem if and only if all the eigenvalues are diiferent than the
zeros of the transfer function of the plant L. This solution can be extended to nonlinear plants
S. If the linearized plant SL is controllable, and the reference model LR is linear, observable
and has simple eigenvalues, then the desired control signals u(k) are given by a superposition
of a linear function of the state and a nonlinear function of the reference signal, namely
102
provided all the eigenvalues are different than the zeros of the transfer function of the
linearized plant SL
Finally, suppose that the reference model SR is nonlinear. Assume that the reference
model SR is stable and its linearized version
has the eigenvalues on the boundary of the
unit circle. If S has well defined relative degree, and SL is controllable and satisfies conditions
for linear tracking, then the control that realizes the output tracking is a function of the state
and the reference signal
u(k) = g(x(k); ry(k))
(97)
(98)
the zero dynamics is given by the solution of
(99)
where d = lrd(S).
5.6.3. Model reference control
The problem of tracking a setpoint sequence r in such a way that the dynamics of the entire
control system is identical to a given stable reference model is referred to as the model
reference control. Various versions of model reference control are exploited in [65]. We
present only a basic simple version discussed in [49]. We consider the plant
where v models the effect of disturbances and d denotes the relative degree, and m = n - d + 1 .
The model (100) can be transformed to the predictor form (72), namely
(102)
and the filtered reference signal ry is to be followed by the plant. For the null static gain,
namely if
the filter does not modify the reference trajectory but imposes the
103
trajectory following dynamics. If there are no disturbances, by (102) and (101) we have
(103)
hence if the model is feedback linearizable and the reference signal is bounded, there exists a
stable control signal u(k) that make the system follow (103) in the region of interest [40, 33].
In other words, there exists a function g such that
u(k) = g(ry(k + d), q - n x(k),
q-n+1
u(k-1))
(104)
(105)
where w denotes the weight vector, then the feedback rule (104) must be replaced by
(106)
where g" is an inverse network with weights v, which approximates the inverse of with
respect to u(k). This setup requires to train both the plant model network and the control
network |f. The situation simplifies if the plant is modeled by an affine network model (62),
namely
(107)
In this case, the inverse network |f is not needed since the control can be simply calculated as
Robust adaptive control methods suggest various ways to modify the gradient algorithm to
train the networks. Typically, an error threshold is applied that leads to a dead zone update,
namely
aw
(109)
where
(110)
s + do if e > do
It is proven by Chen and Khalil [8] that for any threshold d0 > 0, any set 'K around the origin,
any required network approximation accuracy , and any initial state bound, y(k) - r(k)
converges to a ball of radius do centered at the origin, provided the zero dynamics is
exponentially stable, with quadratic Lyapunov function approximations valid in kC, x(0)
,
and the initial weight vector sufficiently close to the one that satisfies the approximation
accuracy condition on k. Another neural technique for the asymptotic tracking is presented
in [9].
5. 6. 4. Internal model control
The tracking method using a a reference model may become unsatisfactory if the plant model
accuracy become too low. It is thus useful to monitor the plant model accuracy to be able to
104
adjust the control accordingly. An idea of the internal model control control [49, 23, 29, 13]
consists in employing a model of the plant and modifying the reference signal namely
r*(k) =
r(K)-(y(k)-y(k))
(111)
where y denotes the internal model output. Such a design is robust against model inaccuracies
and plant disturbances (Fig. 18). To show this design in more details, we use the original plant
equations (100), and approximate the plant with a neural network, namely
(112)
where w denotes the weight vector. The resulting control has the form
(1 13)
where 'g is an inverse network with weights v, which approximates the inverse of with
respect to u(k). If the plant is modeled by an affine network model (62) then the inverse
network is not needed since the control can be calculated as
Note that while the plant model (112) uses less past control values than (105) yet for d > 1
the controller (1 14) requires the proces outputs y(k + d - 1), . . . , y(k + 1) unavailable at the
moment k. In other words, it is necessary to employ an internal model of the system that
predicts the unknown values.
Training of the control network can be better organized if we use predicted signals,
namely the d- 1 -step-ahead output predictor
and a future value of the
reference signal rp(k) = r(k+d- 1) (Fig. 19). By (1 12, 1 13) we obtain
(115)
(116)
Finally, the reference model (105) expressed in the predicted signals, and with the modified
input (111), has the form
rp(k + 1) = aT q-n rp(K) + bTq-n r*(k)
(1 17)
Properties of the entire control system (Fig. 20) are analyzed in details in [49].
5.6.5. General tracking
The general tracking problem is formulated as a problem of tracking the arbitrary reference
signal r. Asymptotic tracking consists in finding an analytic function g and a constant N such
105
Figure 19: Training of the controller in the internal model control scheme.
v(A)
REFERENCE
MODEL
(k)
y(k)
the resulting closed-loop system is asymptotically stable for r = 0 and for every x(0)
sufficiently close to 0
(118)
For the exact tracking it is required that for every ||x(0)|| sufficiently small
(119)
The input-output tracking problems require to determine the control based on past values of
the output rather than the state.
The exact tracking has a solution if and only it d = lrd(S) is well defined, and the zero
dynamics is asymptotically stable. The feedback system tracks the desired signal in d steps,
and only r(k + d) is needed (N = d), namely (Fig. 21)
1
(120)
where is a neural model of g. The asymptotic tracking is possible under the same conditions,
but the solution is not unique. Another neural technique for tracking unknown signal is
presented in [53].
5.6.6. Linearization around the desired trajectory
The tracking method proposed in [1] assumes that the plant has the predictor NARX
representation (72) in a region of interest with the relative degree d well defined. The control
signal is derived through a linearization of the output-input mapping around the desired
106
trajectory r at each time instant. The resulting control signal has a form
(121)
_
where
(122)
if X >
It is shown that for sufficiently slow reference trajectories r and e > 0 sufficiently small,
the reference signal is asymptotically tracked and the resulting closed loop system is_stable.
To build the control signal for unknown plants, it is necessary to know both and
It is
suggested to either set up a neural model of and approximate , or set up a neural model
of and approximate . The networks are trained off-line.
5.7.
Optimal control
(k))
(123)
(124)
k=0
k = 0, .... N - 1
for any u(0) e <X(0), u(N) X(N), where X(0) and X(N) are given compact sets.
(125)
107
The above problem can in principle be solved by the dynamic programming. This
procedure yet calls for the state space discretization at each decision stage. The control rule
(125) additionally requires that the final state be parameterized, what is equivalent to doubling
the size of the state space. Inevitably, the user faces here the curse of dimensionality. The
solution proposed by Zoppoli and Parizini [66] consists of approximation of the control law
by a neural network. Namely,
(126)
where the weight vectors are defined separately for each time moment. In other words,
a separate network is assigned to each time moment. The cost (124) can thus be written
as a function of all the weights w = (w(0), . . . , w(N - 1)) and x(0) and x(N), namely
J =
J(w;x(0),x(N)
(127)
(128)
where x(0) and x(N) are treated as random variables drawn from a uniform distribution on
X(0) x X(N). Since it is necessary to calculate the gradient with respect to all the weights,
one must backpropagate through all the networks. For any weight w(k) of k-th network we
thus have for k = 0, . . . , N - 1
dJ
dw(k)
dJ
du i (k)
(129)
dJ
dxi(k)
and
(131)
with the initial conditions at k = N, namely JX(N) = px(N). Jacobians Jw,(k), employed in the
weight adjustment procedures, can be calculated also by backpropagation, once the structure
of the networks is decided. Note that we used backpropagation in time, and suggested
also backpropagation inside the networks. The gradients originally calculated in [66] use
forward-propagation in time, and backpropagation only in the networks.
The control rule (125) can be parameterized by additional in-between control points, if
the state is to take pre-assigned values at some given time moments. The control rule can
also be additionally a function of parameters of the plant model or the cost function. Note
yet that while the control rule will optimally respond to those additional parameters, all the
additional arguments of the control rule must be know before the control action takes place.
108
The above method can be also applied to infinite horizon problems [66, 47]. Another
approaches to the infinite horizon optimal control problem is presented in [24].
5.7.2. Predictive control
Predictive control [10, 30, 58] is one of popular control techniques for infinite horizon
problems that consists in solving an optimal control problem for a certain finite horizon on the
base of a predicted output, applying only the first control out of the entire control sequence,
and repeating the procedure at each next time step. We use the predictive control to solve
a tracking problem for a SISO system with the unit relative degree
(132)
For a given desired signal r the one-step-ahead predictive control u(k) is to be found such that
the cost
J = (y(k+l)-r(k+l)) 2
(133)
is unknown,
(134)
where w is the weight vector. In [30] it is proposed to take care of the neural model
inaccuracies by a modification of the cost (133), namely
(135)
where p is a regularizing coefficient and a(k) is an additional uncertainty parameter. The
resulting technique consists at each step of solving the extended problem (135) with the
input-output constraints to obtain u(k), applying the control
(136)
to the system, and estimating
control are derived in [30].
109
environment to reduce the uncertainty and to obtain better estimates of the system parameters.
Typical dual controller minimizes a certain cost index in a stochastic environment, e.g.
(137)
where q(k) is the momentary cost that for tracking problems typically depends on a difference
between the reference trajectory r and the actual one y, e.g.
q(k) = lly(k) - r(k)ll2
(138)
While the appropriate Bellman equations can solve the problem in the dynamic programming
setup, this is in most cases too time-consuming to be of practical value. Note that
(139)
where denotes the conditional expected value conditioned on the observations available at
the moment k and the innovation
(140)
is the difference between the actual and the predicted model output. The first term in (139)
penalizes a deviation of the predicted plant output from the reference trajectory, and the
second term is the cost of inaccuracies of the plant output prediction. An influence of the
second (innovation) term on the control, related to a reduction of the model uncertainty, is
called the dual effect. One of possible sources of prediction errors is the difference between
the model and the actual plant. If the model e.g., a neural model of a nonlinear plant
is substituted in place of the plant, and the control is calculated as if the model was identical
to the plant, then the second term in (139) is ignored, and we say that a heuristic certainty
equivalence principle is applied. If it is of interest to diminish the dual effect, one may modify
the cost (138) by subtracting a part of the dual cost, namely [12]
(141)
where
. For the maximal ce = 1 , the dual effect is neglected entirely and the control
is based on the heuristic certainty equivalence principle. Drawbacks of the heuristic certainty
equivalence, like overshoot and stability problems, can be compensated by performing an
off-line training of the plant model to start the control procedure with the already reduced
plant uncertainty. Additionally, one may modify the cost (141) by adding the term cu u2(k- 1),
cu > 0, that penalizes a control cost.
Consider a SISO plant in the affine NARX form (61)
(142)
where {e} are independent Gaussian No>0-2 and represent the plant uncertainty. If the plant
equations were known then the control at k influences the cost only at the next moment.
Assuming that is bounded away from zero, the optimal control can be easily calculated
If the plant equations are not known, and a model is to be used, the control influences also
the estimated model. Suppose we use the model of the form (142), namely
(144)
1 10
(145)
where
is the hidden layer output. Consequently, (144) can be
rewritten in a form linear with respect to the output weights
(146)
where
, and
The output
layer weights v appear linearly in the system model, hence one may use a Kalman filter to
update the weights estimators v(k), namely
with the initial conditions v(0) = 0, P(0) = Cg. I where . is a 'large' parameter. While the
Kalman filter may lead to intensive computations for large networks, various simplifications
of the filter are known.
Well known properties of Kalman filter estimator enable to show that
(148)
This allows for calculation of the optimal cost, namely
P01
is taken into account through P, and is entirely ignored if ce = 1, and maximally attenuated if
ce = 0. The first case, for cu = 0, is equivalent to the controller based on the heuristic certainty
equivalence principle, while the second corresponds to the cautious controller, where the
model parameters as treated as the actual plant parameters.
5.8.
Reinforcement learning
11
processes, and current intensive research will hopefully enable to understand the reasons of
successes and failures of many other. The reinforcement methods may overcome the curse of
dimensionality due to parametric approximations of functions (like the cost to go function)
which otherwise require an exponential growth of resources with the problem dimension.
Moreover, the reinforcement methodology can be applied to only approximately known or
unknown plants.
Control methodology coming from biology has a special reason to be included in
this discussion of neurally-inspired control methods, even though neural networks are not
necessarily built into the reinforcement control schemes. Being bound by the size of this
Chapter, we will not elaborate on various reinforcement control methods, but rather show
how neural approximations are built into reinforcement schemes. The reader interested in the
reinforcement control field is directed to books of Sutton and Barto [55] and Bertsekas and
Tsitsiklis [5].
We introduce some basic reinforcement methods for a time invariant nonlinear plant (7)
with the state fully observed (x = y) and with a state feedback control loop, namely
x(k+l) = f(x(k),u(k))
u(k) = g(x(k))
(151)
(152)
(153)
While we consider only infinite horizon control problems, we may distinguish a special
termination state Xe that stops the control process. The goal of control is to minimize the
discounted cost to go (called also the secondary utility or strategic utility) defined as
(154)
where a e [0,1) is the discount rate. If a = 0 then the 'long term' extends only for one
moment ahead R(t) = r(t + 1), and the increase of a enlarges the number of time moments
"practically" taken into consideration in R. In the limit case a = 1 there is no discount.
We generalize the state equations (151) to include state uncertainties, assuming that f is
random. More exactly, we assume that for any initial state and any stationary feedback law
g, the resulting sequence of states is a Markov process, and conditional distribution Pg(x, u)
of a "next state" g(x, u) given the "previous" state x and control u does not depend on time.
It also fruitful to extend the deterministic control rule (152) to a stochastic rule. One
of the reasons of this departure from the deterministic optimal control is to allow for better
observations of the environment. Another is to be able to smooth out the control rule in cases
where the control space is finite, e.g. for binary control. One may then define a randomized
control as a differentiable function a parameter such that, for specific values of the parameter,
we obtain the preassigned binary values. As for the state equations, we may at each time
extend the feedback law (152) by a random element, and assume that each random element is
independent of any other random element, and has identical (time-independent) distribution.
We denote the resulting (time-independent) distribution of g(x) conditioned on x by Pg(X).
Note that if the control rule is deterministic than the distribution Pg(X) is concentrated on
a single point x, i.e. P{g(x) - x} = 1.
The expected discounted cost for a given control policy represented by a state feedback g
for a plant presently at state x
1 12
(155)
0 = x)
e
is called the (state) value function. (Additionally, we define J(x ) = 0.) The expected
discounted cost for a plant presently at state x that applies action u and furthermore continues
with control policy given by g
(156)
is called the action value function Q. It is easy to notice that J and Q are related to each other,
namely
where
Note that J depends directly on the state uncertainty, while Q on
the control uncertainty. By the elimination of either Q or J from the above equations, one
obtains the Bellman equation, namely
Similar relation can be written in terms of the action value function. For deterministic
feedback rules, (157) simplifies to
(160)
and then the Bellman equation (159) also simplify, namely
Under quite general assumptions, the Bellman equations have unique solutions [4].
The reinforcement control we discuss is similar to those considered earlier in this Chapter,
yet with a special way to calculate the cost of control. The cost is calculated by an element
called traditionally the critic. The goal of the critic is to convert instantaneous cost r into a
long term cost.
5. 8. 1 . HDP IE AC: Heuristic Dynamic Programming with Backpropagated Adaptive Critic
The HDP structure proposed by Werbos [60, 61, 63, 64] uses adaptive critic to approximate
the cost-to-go in a control scheme, and backpropagates the gradient of the cost-to-go to
the controller to approximate the optimal feedback. The adaptive critic consists of an
approximator (like a neural network) -that estimates the value function with the use of the
time difference method. Namely, since
d(t) = p(x(t), u(t) + a J(x(t + 1);w(t))- (x(0;
2
w(0))
(162)
then (d(t)) may serve as the error to be minimized with respect to the weights. Consequently
w ( t + 1) = w(t)
(163)
Note that only the weights of the current estimate (at x(f)) are adapted, and the future estimate
of the value function together with the current reinforcement serve as the desired value.
Consequently, the weights are modified with a one-step delay. Two copies of the critic model
are required in calculations, Fig. 23. The first network uses the new value of the state x(t + 1)
to calculate the estimate J(x(t + 1); w(f)), and the second network, with identical weights,
uses the previous value of the state x(t) to calculate the estimate J(x(t); w(t)). This makes it
possible to evaluate the time difference (162) and to calculate the gradient of the squared time
difference with respect to the weights of the second network
g(t) = d(t)
; w(t))
(164)
This enables to make the weight vector adjustment step (for instance, using the simple
gradient method w(t + 1) = w(t)
in both networks. All the elements of the entire
HDP control scheme (controller, plant model, critic) may in general be approximated by
neural networks, it is yet suggested that the plant model be approximated first. The controller
network weights must be approximated in such a way as to minimize the approximated
cost-to-go J, as calculated by the adaptive critic, in the system
l)=f(x(t),u(t))
(165)
In this order, the backpropagated adaptive critic (BAC) technique can be used [63]. Namely,
the derivative J with respect to the weights v of the control network is calculated by
backpropagation through the critic and the model networks to the control network (Fig. 24).
5.8.2. Dual heuristic programming
Dual heuristic programming (DHP) is a critic design that uses a different type of critic [60,
63, 64]. Here the critic network approximates the gradient of J with respect to the state
To give a rough idea of this design, we assume smooth differentiability of the
necessary functions and obtain by differentiation of (159) with respect to the state
(166)
X(t+1)
Figure 23: The adaptive critic using the time difference weight adjustment.
114
J(t+1)
where px and pu are gradients, fx, fu, and gx, are Jacobian matrices, and u = g(x). The above
equation is a basis for an adjustment of the weights w of the estimator
to minimize
where d is the DHP vector equivalent of the scalar time difference in the HDP
design. Here
(167)
where
(168)
with x = x(t), x+ = x(t +1), and for u = g(x(t)) calculated on the base of the plant model.
Like for the HDP design, two copies of the critic model are required in calculations. The first
network uses the new value of the state x(f + 1) to calculate the estimate of
and the second network, with identical weights, uses the previous value of the state x(t) to
calculate the estimate
. This enables to calculate d(t) and the gradient of
with respect to the weights of the second network
(169)
and the weights of the action network u = g(x; v) can be adjusted to minimize the squared
norm of
(171)
Concluding remarks
115
A method for tuning the PID controller without such prior identification is proposed in [17].
A neural network g(y; w) approximates the controller, and is trained to minimize the usual
cost
We have
du(k)
is equal to
(173)
where the Jacobian
(174)
Incidentally, this correspond to the cost function
. After the training, the PID
controller parameters are calculated by the least squares method, with the use of the neural
controller results. The method gives good results also for noisy and open-loop unstable plants
where most of traditional tuning methods fail. Another neural technique for adaptive tuning
PID controller is proposed in [22].
5.9.2. Summary
The area of adaptive systems took decades to shape, having been modified with the increasing
availability of new powerful mathematical and computational tools. One of the newest tools
are neural networks. It seems that the enthusiasm not contaminated with knowledge, seen
by Minsky in early days of neural networks, has converted into enthusiasm supported by
knowledge. We also faced a reverse side of the same attitude, namely a scepticism not
contaminated by knowledge. At present, the methods provided by neural networks have
matured and seem to be indispensable in control, system modeling and identification.
With all the theoretical achievements of neural techniques in control, always the simplest
controllers that satisfy the demands and constraints will be chosen in applications. And
usually, neural controllers are rather complex, and may even carry a stigma of something
unusual, having an unpredictable behavior. Moreover, the mechanisms inside the neural
controllers are often little understood by practitioners, hence such controllers may be treated
as practically unsafe. Consequently, to pave the way to neural controllers in particular
applications, traditional controllers must first be proven inadequate. On the other hand, safety
issues, often underestimated by control theoreticians, must be of greater concern.
Even in contemporary solutions there is sometimes to much heuristics and too little theory,
which is especially needed in novel control solutions. For instance, training of networks off
line is currently still preferred since system identification in a closed loop is computationally
intensive and the on-line training can make the overall system unstable. Namely, if the local
relative degree is not well defined, tracking performance may not be acceptable. Even if
it is well defined but the zero dynamics is not stable, the system identification may work
but the control may grow unboundedly and the tracking error, first small, may get out of
116
control. It is thus important to understand well certain properties of system to be applied. But
here comes another problem: theoretical system properties must be deduced from models,
which have different structure than the systems, and only in some sense, like a similarity
of output signals, are "close" to the modeled systems. This may be not sufficient to claim
that theoretical properties, fulfilled by the object, will be fulfilled by the model. Another
theoretical issue that needs more light is the stability of nonlinearly parameterized networks
that are trained in dynamic environments. These areas are still open to research.
Acknowledgement
The author is deeply grateful to Wodek Macewicz for making the final drawings and continual help with
References
[1] O. Adetona, E. Garcia, and L.H. Keel, "A new method for the control of discrete nonlinear dynamic
system using neural networks, IEEE Trans, on Neural Networks, vol. 11, No. 1, pp. 102112, Jan. 2000
[2] D. Aeyels, "Generic observability of differentiable systems," SIAM Journal of Control and Optimization,
vol. 19, pp. 595603, 1981
[3] A. Barron, "Universal approximation bounds for superposition of a sigmoidal function," IEEE Trans, on
Information Theory, vol. 3, pp. 930945,1993
[4] D.P. Bertsekas, Dynamic Programming and Optimal Control, vols. I and II, Athena Scientific, Belmont,
Mass. 1995
[5] D.P. Bertsekas and J.N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 19%
[6] J.B.D. Cabrera and K.S. Narendra, "Issues in the Application of Neural Networks for Tracking Based on
Inverse Control," IEEE Trans, on Automatic Control, vol. 44, No. 11, pp. 20072027,1999
[7] F.-C. Chen and C.C. Liu, "Adaptively controlling nonlinear continuous-time systems using multilayer
neural networks," IEEE Trans, on Automatic Control, Vol. 39, No.6, pp. 13061310, 1994
[8] F.-C Chen and H. Khalil, "Adaptive control of a class of nonlinear discrete-time systems using neural
networks," IEEE Trans, on Automatic Control, vol. 40, No. 5, pp. 791801, May 1995
[9] Y-C Chu and J. Huang, "A neural-network method for the nonlinear servomechanism problem," IEEE
Trans, on Neural Networks, vol. 10, No. 6, pp. 14121423, Nov. 1999
[10] D.W. Clarke, C. Mohtadi, and P.S. Tuffs, "Generalized predictive control," Automatica, vol. 23, pp.
137160
[11] G. Cybenko, "Approximation by superpositions of a sigmoidal function," Mathematics of Control,
Signals, and Systems, vol. 2, pp. 303314, 1989
[12] S. Fabri and V. Kadirkamanathan, "Dual adaptive control of nonlinear stochastic systems using neural
networks "Automatica, vol. 34, No. 2, pp. 245253, 1998
[13] D. Flynn, S. McLoone, G.W. Irwin, M.D. Brown, E. Swidenbank, and B.W. Hogg, "Neural control of
Turbogenerator Systems," Automatica, vol. 33, No. 11, pp. 19611973, 1977
[ 14] K. Funahashi, "On the approximate realizations of continuous mappings by neural networks," Neural
Networks, vol. 2, No. 3, pp. 183192, 1989
[15] G.C. Goodwin, P.J. Ramadge, and P.E. Caines, "Discrete time multivariable adaptive control," IEEE
Trans, on Automatic Control, vol. 25, pp. 449456, June 1980
[16] S. He, K. Reif, R. Unbehauen, "A neural approach for control of nonlinear systems with feedback
linearization," -IEEE Trans, on Neural Networks, vol. 9, No. 6, pp. 14091421, Nov. 1998
[17] E.M. Hemerly and C.L. Nascimento Jr., "An NN-based approach for tuning servocontrollers," Neural
Networks, vol. 12, pp. 513518, 1999
[18] K. Hornik, "Approximation capabilities of multilayer feedforward neural networks," Neural Networks,
vol.4, pp. 251257, 1990
[19] K. Homik, "Some results on neural networks approximations," Neural Networks, vol.6, pp. 1069-1072,
1993
[20] K. Homik, M. Stinchcombe, and H. White "Multilayer feedforward networks are univeral
approximators," Neural Networks, vol. 2, No. 5, pp. 359366, 1989
117
[21] K. Hornik, M. Stinchcombe, and H. White "Univeral approximation of an unknown mapping and its
derivatives using multilayer feedforward networks," Neural Networks, vol. 3, No. 5, pp. 551560, 1990
[22] S.N. Huang, K.K. Tan, and T.H. Lee, "A combined PID/adaptive controller for a class of nonlinear
systems," Automatica, vol. 37, pp. 611618, 2001
[23] KJ. Hunt and D. Sbarbaro, "Studies in neural-network-based control" in Neural Networks for Control
and Systems, K. Warwick, G.W. Irwin, and K.J. Hunt, Eds., Peter Peregrinus Ltd., London, U.K., pp.
94122, 1992
[24] E. Irigoyen, J.B. Galvan, and M.J. Perez-Ilzarbe, "Neural networks for constrained optimal control of
non-linear systems," Proc. of the 2000 International Joint Conference on Neural Networks IJCNN'00,
Como, Italy, 2000
[25] M.S. Iyer and D.C. Wunsch, II, "Dynamic reoptimization of a fed-batch fermentor using adaptive critic
design," IEEE Trans, on Neural Networks, vol. 12, No. 6, pp. 1433-1444, Nov. 2001
[26] S. Jagannathan, "Control of a class of nonlinear discrete-time systems using multilayer neural networks,"
IEEE Trans, on Neural Networks, vol. 12, No.5, pp. 11131120, Sept. 2001
[27] B. Jakubczyk, "Feedback linearization of discrete-time systems," Systems Control Letters, vol. 9, pp.
411416, 1987
[28] L. Jin, P.N. Nikiforuk, and M.M. Gupta, "Approximation of discrete-time state-space trajectories using
dynamic recurrent neural networks, IEEE Trans, on Automatic Control, vol. 40, No. 7, pp. 1266-1270,
July 1995
[29] J. Kalkuhl, K.J. Hunt, and H. Fritz, "FEM-based neural-network approach to nonlinear modeling with
application to longitudinal vehicle dynamics control," IEEE Trans, on Neural Network, vol. 10, No. 4,
pp, 885897, July 1999
[30] C. Kambhampati, J.D. Mason, K. Warwick, "A stable one-step-ahead predictive control of non-linear
systems," Automatica, vol. 36, pp. 485495
[31] H.G. Lee, A. Araphostathis, and S.I. Marcus, "On the linearization of discrete-time systems,"
International Journal of Control, vol. 45, pp. 11031124, 1987
[32] M. Leshno, V. Lin, A. Pinkus, and S. Schocken, "Multilayer feedforward networks with a nonpolynomial
activation funtion can approximate any function", Neural Networks, vol. 6, No. 6, pp. 861867, 1993
[33] A.U. Levin and K.S. Narendra, "Control of nonlinear dynamical systems using neural networks
controllability and stabilization," IEEE Trans, on Neural Networks, Vol. 4, No. 2, 192206, 1993
[34] A.U. Levin and K.S. Narendra, "Recursive identification using feedforward neural networks,"
International Journal of Control, vol. 61, No. 3, pp. 533547, 1995
[35] L. Ljung, J. Sjoberg, and H. Hjalmarsson, "On neural network model structures in system identification,"
in S. Bittanti and G. Picci (Eds.) Identification, Adaptation, Learning. The Science of Learning Models
from Data, pp. 366399, Springer-Verlag, Berlin 1996
[36] A.S. Morse, "Global stability of parameter adaptive systems," IEEE Trans, on Automatic Control, vol.
25, pp. 433439, June 1980
[37] S. Mukhopadhyay and K.S. Narendra, "Disturbance rejection in nonlinear systems using neural
networks," IEEE Trans, on Neural Networks, vol. 4, No. 1, pp. 63-72, Jan 1993
[38] K.S. Narendra "Neural Networks for Control: Theory and Practice," Proceedings of the IEEE, vol. 84,
No. 10, pp. 13851406, 1996
[39] K.S. Narendra and Y.H. Lin, "Stable direct adaptive control," IEEE Trans, on Automatic Control, vol. 25,
pp. 456461, June 1980
[40] K.S. Narendra and S. Mukhopadhyay, "Adaptive control of nonlinear multivariable systems using neural
networks," Neural Networks vol. 7, No. 5, pp. 737752, 1994
[41] K.S. Narendra and K. Parthasarathy, "Identification and control of dynamical systems using neural
networks," IEEE Trans, on Neural Networks, Vol. 1, No. 1, pp. 427, 1990
[42] G.W. Ng Applications of Neural Networks to Adaptive Control of Nonlinear Systems, Research Studies
Press Ltd., Somerset, England, 1997
[43] D.H. Nguyen and B. Widrow, "Neural networks for self-learning control systems," IEEE Control Systems
Magazine, Vol. 10, pp. 1823, 1990
[44] A. Pacut, "Symmetry of Backpropagation and Chain Rule," Proc. of the 2002 International Joint
Conference on Neural Networks IJCNN'02, Honolulu, HA, IEEE Press, Piscataway, NJ, pp. 530534,
2002.
[45] T. Parizini and R. Zoppoli, "Neural networks for feedback feedforward nonlinear control systems," IEEE
Trans, on Neural Networks, Vol. 5, No. 3, pp. 436449, 1994
118
[46] T. Parisinio and R. Zoppoli, "Neural approximations for multistage optimal control of nonlinear stochastic
systems," IEEE Trans, on Automatic Control, vol. 41, pp. 889895,19%
[47] T. Parisinio and R. Zoppoli, "Neural approximation for infinite-horizon optimal control of nonlinear
stochastic systems," IEEE Trans, on Neural Networks, vol. 9, No. 6, pp. 13881408, Nov. 1998
[48] J. Park and I.W. Sandberg, "Universal approximation using radial-basis function networks," Neural
Computations, Vol. 3, pp. 246257
[49] I. Rivals and L. Personnaz, "Nonlinear Internal Model Control Using Neural Networks: Applications to
Processes with Delay and Design Issues," IEEE Trans, on Neural Networks, vol. 11, No. 1, pp. 8090,
Jan. 2000
[50] D.E. Rummelhart, G.E. Hinton, and R.J. Williams "Learning internal representation by errror
propagation," in Parallel Distributed Processing: Exploration in the Microstructure of Cognition, D.E.
Rummelhart and J.L. McClelland, Eds., vol. 1, Chap. 8, Cambridge, MA, MIT Press, 1986
[51] I.W. Sandberg, "Approximation theorems for discrete-time systems," IEEE Trans. Circuits and Systems,
vol. 38, No. 5, pp. 564-566, May 1991
[52] R.M. Sanner and J.-J.E. Slotine, "Gaussian networks for direct adaptive control," IEEE Trans, on Neural
Networks, Vol. 3, No. 6, 837863, 1992
[53] Q. Song, J. Xiao, and Y.C. Soh, "Robust backpropagation training algorithm for multilayer neural tracking
controller," IEEE Trans, on Neural Networks, vol. 10, No. 5, pp. 11331141, Sept. 99
[54] E.D. Sontag, Mathematical Control Theory, Springer-Verlag New York 1990
[55] R.S. Sutton and A.G. Barto Reinforcement Learning. An Introduction, MIT Press, Cambridge, MA
[56] G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch, "Comparison of a heuristic dynamic
programming and a dual heuristic programming based adaptive critic neurocontrollers for a
turbogenerator," Int. Joint Conference on Neural Networks IJCNN'00, Como, Italy 2000,
[57] G.K. Venayagamoorthy, R.G. Harley, and D.C. Wunsch, "Excitation and turbine neurocontrol with
derivative adaptive critics of multiple generators on the power grid," Int. Joint Conference on Neural
Networks IJCNN'0l, Washington, DC, 2001
[58] L.-X. Wang and F. Wan, "Structured neural networks for constrained model predictive control,"
Automatica, vol. 37, No. 8, pp. 12351243, 2001
[59] P. Werbos, "Backpropagation: Past and future," IEEE Int. Conference on Neural Networks, San Diego.
California, July 1988, vol. I, pp. 343-353,1988
[60] P. Werbos, "A menu of designs for reinforcement learning over time," Ch. 3 in W.T. Miller III, R.S.
Suttotn, and P.J. Werbos (Eds.), Neural Networks for Control, MIT Press, Cambridge, Mass., pp. 6795,
1990
[61 ] P.J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem". Neural Networks,
vol. 3, pp. 179189, March 1990
[62] P. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political
Forecasting, Wiley 1994
[63] P.J. Werbos, "Stable adaptive control using new critic designs," http://xxx.lanl.gov/html/adap-org/981000l,
1998
[64] P.J. Werbos "New directions in ACDs: Keys to intelligent control and understanding the brain," Int. Joint
Conf. on Neural Networks UCNN'00, Washington, DC, vol. III, pp. 6167,2000
[65] B. Widrow and E. Walach, Adaptive Inverse Control, Prentice-Hall, Englewood Cliffs, NJ, 19%
[66] R. Zoppoli and T. Parisini, "Neural approximations for finite and infinite-horizon optimal control," Ch.
12 in O. Omidvar and D.L. Elliott (Eds.), Neural Systems for Control, Academic Press, San Diego, CA,
pp. 317351, 1997
[67] S-H Yu and A.M. Annaswamy, "Stable neural controllers for nonlinear dynamic systems," Automatica,
Vol. 34, No. 5, pp. 641650, 1998
[68] T. Hrycej, Neurocontrol. Towards an Industry Control Methodology, Willey, New York 1997
1 19
Chapter 6
Neural Networks for Signal Processing
in Measurement Analysis
and Industrial Applications:
the Case of Chaotic Signal Processing
Vladimir GOLOVKO, Yury SAVITSKY, Nikolaj MANIAKOV
Laboratory of Artificial Neural Networks, Brest State Technical University
Moskovskaja str. 267, 224017 Brest, Belarus
Abstract This chapter discusses the use of neural networks for signal processing. In
particular, it focuses on one of the most interesting and innovative areas: the chaotic
time series processing. This includes time series analysis, identification of chaotic
behavior, forecasting, and dynamic reconstruction. An overview of chaotic signal
processing both by conventional and neural network methods is given.
6.1. Introduction
Neural techniques have been successfully applied to many problems in the area of signal
processing. Different goals and perspectives can be considered in manipulating data
sequences generated by physical processes.
Signal filtering is a classical technique to modify the characteristics of the signal itself.
Like in traditional approaches, in the neural approaches the signal is observed through a
sampling window sliding in time over the signal itself: whenever the observation window
photographs a set of signal samples, filtering or transformation is applied and generates the
output view of the incoming signal. Many practical examples have been reported in the
literature related to various application areas (e.g., in electronics, electrical engineering,
mechanical systems, chemical plants, biomedical systems, radio transmissions).
Noise cancellation in a continuous signal is one of the most interesting applications
desirable in a wide variety of practical cases. To reduce the noise we can use a finiteduration impulse response (FIR) filter [5], the transformation from the spatial to the
frequency space by means of the Discrete Fourier Transform, the Wavelet Shrinkage
method [11], or median filtering [12]. However is not always efficient. The use of the ICA
(independent component analysis) neural network for extracting noise-free data has been
shown a powerful approach [13]. In the case of Gaussian data the PCA (principal
component analysis) neural network was shown efficient [14]; it can also be adopted data
compression and reduction.
Prediction and cross-correlation abilities of the neural network can be used to
reconstruct signals whenever the noise makes the signal poorly understandable or when the
sensor observing the signal is occasionally or temporarily not working properly.
120
Neural implementation of some classical transformations (e.g., Walsh, Hough) has been
also studied to exploit the adaptivity of the neural paradigms in configuring the filter
coefficients; harmonic signal analysis by neural networks is another high-level
transformation.
Signal processing can also be used to extract relevant information from the input signal,
e.g., to detect the occurrence of characteristics waveforms, pulses, spikes, and regularities.
Several applications are known in speech and sound processing (e.g., automatic typewriters;
phoneme and word recognition; speech understanding; automatic translators; voice and
sound compression, equalization and manipulation; voice and sound synthesis). Other
industrial applications are related to identification of the operating conditions of machinery,
plants, and production processes by observing sensor data, and to data cleaning for system
diagnosis.
Overviews of different types of neural networks suited for signal processing as well as
overviews of their effective applications can be found in [414]. Feedforward neural
networks are one of the most used, especially the multilayer perceptron (MLP) and the
radial basis function networks (RBF) [79]. Both of these network types have been shown
to be universal function approximates [9] and, consequently, very appropriate for signal
processing applications. Another family suited for dynamic modeling is the recurrent neural
network (also called time-delay neural network) that was used, e.g., in nonlinear prediction
and modeling, adaptive equalization of communication channel, speech processing and
measurement [10].
In many real systems (e.g., compound pendula, dripping faucets, predator-prey
ecologies, measle epidemics, oscillating chemical reactions, irregular heart beats, stock
market, EEG patterns of brainwave activity, central nervous system, physical systems,
social behavior), a chaotic behavior has been observed, i.e., a complex, erratic, extremely
input-sensitive behavior which cannot be easily understood. Chaos theory is nowadays
widely studied and applied in various areas to describe, characterize, and possibly predict
the system behavior when such kind of complexity occurs. Due to the increasing interest in
these kinds of models and processing, this chapter focuses therefore on chaotic signal
processing.
In the system theory, chaotic systems are deterministic models that can be used to
describe random, noisy, unpredictable behaviors that are present in natural systems. The
behavior of a chaotic system is governed by simple deterministic nonlinear rules that are
iteratively applied to generate the next state from the current state and input values;
although these rules do not contain any noise, randomness, or probabilities, their repeated
application leads to very complex system behaviors in the long term, that cannot be
captured by simple global rules. In this sense, unpredictability "emerges" over time.
The chaotic behavior of a dynamical system can be described either by nonlinear
mathematical equations or by experimental data. Unfortunately, often we do not know the
nonlinear equations that describe the dynamical system. In general we have only
experimental signals from the unknown dynamical system. The problem consists therefore
in identifying the chaotic behavior and building a model that captures the important
properties of the unknown system by using only the experimental data. In order to
determine the main properties of our model, we can use the dynamic invariants (namely:
correlation dimension, Lyapunov's exponents, and Kolmogorov's entropy).
A chaotic system has a sensitive dependence on the initial conditions: starting from very
close initial conditions a chaotic system may very rapidly moves to different final states.
Another problem concerning chaotic signals is that they are unpredictable on the long
term, because an error at the beginning of prediction increases exponentially in time [1,2].
An improvement of the prediction accuracy is therefore fundamental. Besides, this allows
121
also for understanding the observed behavior of a nonlinear system and for reconstructing
the space of the system states by taking into account the numerical data measured in the
system. This is based on the embedding theorem [3], which guarantees that the full
knowledge about the system behavior is contained in the time series of characteristic
quantities measured in the system; the complete multivariate phase space can be
constructed from these time series. The embedding theorem is characterized by some
parameters, namely the embedding dimension and the time delay. The estimation of these
parameters provides a maximum predictability of the chaotic time series and can be used to
choose the optimal window size (number of input samples) to perform forecasting. Neural
approaches have been shown effective in forecasting for chaotic systems: examples and
techniques will be described in this chapter.
A further problem concerns the chaotic time series processing by using only observed
data. From small data samples it is in fact very difficult to reconstruct the system dynamics
and to compute the Lyapunov's spectrum. Also in this case neural networks have been
shown more powerful for chaotic time series processing than traditional approaches.
To tackle chaotic signal processing by means of neural paradigms, multilayer perceptron
(MLP) and radial basis function networks (RBF) [7-9] as well as time-delay neural
networks (TDNN) [10] can be applied both to chaotic signal identification and forecasting.
The ICA (independent component analysis) and the PCA (principal component analysis)
neural networks are suited for advanced filtering [13,14].
Processing of a chaotic signal can be divided in four stages, as shown in Fig. 1. In the
first stage the time series analysis is performed to extract the characteristics of the signal.
Then the embedding parameters are evaluated to identify the chaotic behavior. Prediction
can be performed on the identified model. Finally the phase space reconstruction can take
place or the neural network can be built for optimal forecasting.
Time series
Identification of chaotic
behavior
Prediction
Attractor
Figure 1: Functional diagram of data processing.
The rest of the chapter is organized as follows. Section 2 reviews the use of multilayer neural
networks for signal processing. Section 3 discusses the nonlinear dynamical systems suited for
chaotic signal processing and the strange attractor, using Lorenz and Henon data. Section 4
presents different approaches for identification of chaotic behavior. Section 5 describes the time
series analysis, namely the computation of the embedding parameters. Section 6 tackles the
analytical approaches for computing the Lyapunov's exponents that characterizes the system
122
chaoticity. Section 7 presents the neural network approach to determine the Lyapunov's spectrum,
having very low computational complexity and requiring small data sets. Section 8 introduces the
neural network approach for chaotic time series forecasting for individual data points. Section 9
discusses the use of neural network for state space reconstruction.
for/={1,p}
(1)
where F is the nonlinear function (sigmoid or other nonlinear), m is the number of hidden
units, n and p are the numbers of input and output units respectively, Vj1 and wij are the
weights, Tj are the thresholds.
The radial basis function network RBF is described by:
-T> }
for/={l,p}
(2)
1=1
i+e(t
(3)
(4)
MLP and RBF networks can be effectively used to model NAR processes. These
networks can be configured by appropriate algorithms, e.g., backpropagation and its
advanced variations, conjugant gradients, and Levenberg-Marquard.
The recurrent neural networks (RNN), also called time-delay neural networks (TDNN),
are extensions of the feed-forward neural networks, obtained by introducing time delays on
connections [10] as shown in Fig. 2. This approach was widely used in speech recognition,
nonlinear prediction, adaptive equalization of communication channels, and plant control.
According to the type of feedback loop, we distinguish the Jordan's network, the Elman's
network, and the multi-recurrent neural network.
123
The Jordan's network consists of a multilayer perceptron with one hidden layer and a
feedback loop from the output to additional inputs (or context). It computes a nonlinear
function of n past sequence elements and q past estimates:
where x(t) is the actual value, and x(t) is the desired value. This model is the nonlinear
extension of the ARMA model, i.e., the combination of AR and MA (moving average)
components.
The Elman's network has a feedback loop from the hidden layer to additional inputs (or
states). This model is described by:
(6)
where the matrixes W1,W2,W3 represent three sets of weights: from the input layer to the
hidden one, from the hidden layer to the state inputs, and from the hidden layer to the
output one.
The multi-recurrent neural network has a feedback both from the hidden and the output
layers to the input one. This model is represented by:
(7)
124
^1 = OWO,0
dt
(8)
where x(t) = [x1 (t), x2 (t),...,xn (t)] is the vector of the system state and is the vector field.
The system is called autonomous if the function is not changing in time, i.e.,
The system state at any time is defined as a point in the n dimensional space. The vector
field maps a manifold to a tangent space. The integral curve (or trajectory) identifies a flow
on the manifold. The set of these flow curves are called orbits.
In the case of linear differential equations the trajectories either asymptotically descend
in a phase space to the fixed point or are closed orbits when t > o . In the case of a
nonlinear function O under suitable conditions, the behavior is chaotic and the orbits
become the complex subset called strange attractor. Some strange attractors have a known
mathematical description (e.g., Lorenz's and Rossler's attractors, Mac key-Glass chaotic
time series). Other attractors have been experimentally confirmed to be chaotic but there is
no known analytical description (e.g., fluid turbulence, Gravity waves, EEG data).
As an example, the attractor of Lorenz system is shown in Fig. 3; it is described by the
following three coupled nonlinear differential equations:
dx
dv
= -xz + rx-y
dt
(10)
dz
,
= xy-bz
dt
where G=10, r=28, and b=8/3. Lorenz proposed this model for the atmospheric turbulence.
An attractor is a subset of the manifold to which an open subset of points (the basin of the
attractor) tends to the limit when t . These are called dissipative systems.
The chaotic flow has a very sensitive dependence on the initial conditions, i.e., points
that are initially close each others may exponentially diverge in time. In Fig. 4 two series of
a Lorenz system are shown: Series I starts from the initial point [0, 0.1, 0], while Series II
starts from [0.001, 0.1, 0]. A little change in the initial condition leads shortly to different
behaviors. This high sensitivity results in unpredictability of the chaotic systems in the long
term, since any little inaccuracy is later increased exponentially in time. However, it is
important to point out that both of the above series describe the same attractor.
When a dynamical system is described by a first-order differential equation, a chaotic
behavior is observed only if the dimension of phase space is greater than 2. However, when
the dynamical system is described by difference equation, then the chaotic behavior occurs
also in the 2-dimensional space. For the Henon's map:
?+!
= fan
125
-0,6
Figure 5: Henon's attractor
As already said, the chaotic behavior can be described either by nonlinear mathematical
equations or by experimental data. However, in general only experimental samples of
signals produced by the unknown dynamical system are available. The problem consists
therefore of identifying the chaotic behavior from these samples and building a model that
captures the underlying dynamics of the unknown system. As dynamic invariants
correlation dimension, Lyapunov's exponents, and Kolmogorov's entropy can be used.
126
20
40
60
80
100
127
45
40 35
30 25
20
20
30
40
50
Figure 7: Mapping of the consecutive maximum of the Lorenz's Z-series
Unfortunately, this method is not effective with data affected by noise: for small time
derivatives additional extremes can by produced by the perturbations induced by the noise
and, consequently, the behavior may become similar to a random process.
Another tools to verify the chaotic behavior is the Fourier transform:
(12)
that transforms the function x(t) into the frequency spectrum. The power spectrum is
defined as P(co) = |*(CI)|. For a periodic oscillation the power spectrum contains a finite
number of frequencies. For a chaotic process it is a broad band. It is worth noting that a
multi-harmonic power spectrum does not necessarily correspond to a chaotic system.
Systems with a high number of degrees of freedom can generate similar power spectrum.
When a system is represented by samples taken at discrete times in a 2" period (as in the
case of time series), we can use the discrete Fourier transform:
2mktl2"
(13)
In Fig. 8 the continuous power spectrum of Henon X-series (for 1024 points) is
presented: it allows for identifying the chaotic motion or the multi-harmonic oscillation.
101
201
301
401
Another tool to check the process chaoticity is the autocorrelation function, which is
defined as follows for continuous and discrete signals, respectively:
1 T
- x(t)-x<
<T2o
C(T)
(14)
(15)
128
In the practice these formulas need to be approximated since only a finite number of point is
available. The autocorrelation of a periodical signal produces a periodical function. For
chaotic or random signals the autocorrelation function rapidly descend to zero: the
autocorrelation function of the Lorenz X-series is shown in Fig. 9.
tau
Another technique to verify the chaotic behavior is the fractal dimension. The attractor
of a chaotic process at any time has in fact a fractal dimension. For the one-dimensional
observation of the process the correlation dimension D2 can be evaluated by the algorithm
presented in [1]; since this technique requires the use of embedding parameters, it will be
discussed in section 5.
The most common and effective test for chaotic behavior verification consists of the
Lyapunov's exponent. If the largest Lyapunov's exponent is positive, the process is chaotic;
if the sum of all Lyapunov's spectrum is negative the system dissipates and converges to
the attractor. The computation both of the largest Lyapunov's exponent and the Lyapunov's
spectrum will be discussed in section 6.
6. 5. Embedding parameters
A dynamic source of chaotic signals is not fully represented by a one-dimensional
observation in the time domain since the chaotic dynamics take place in a phase space
having a higher number of dimensions (e. g., for the differential equation it is at least 3).
However, the phase space of a chaotic process can be reconstructed from only one time
series of observations by using the embedding parameters, as first shown in [2]: the points
of time series and their difference (like a derivative) are used as coordinates to build the
state space. In [3] the formal proof is given, which is known as the following Time-Delay
Embedding Theorem. Let's consider a dynamical system having a solution (x(t), y(t),...,
z(t)) in a {d-dimensional phase space. By using only one coordinate x(t), under general
conditions it is possible to build such space of lag points (x(t), x(t+T), x(t+2T)
x(t+(D-l)T. )), that will be a diffeomorphism between itself and the attractor of the
dynamical system in the real phase space. The dimension D satisfies D > 2[d F ]+1, where
dF is the fractal dimension of the attractor and [J is the integer part.
The condition D > 2\dFJ+7 is sufficient but not necessary for reconstructing the
dynamics. Besides, also the above theorem assumes that the observable signal is noiseless.
However, in the practice signals are noisy time series. Therefore, the experimental data
must be preprocessed in order to minimize the influence of noise on the subsequent
analysis. To this purpose a FIR filter or an ICA neural network can be used.
The embedding theorem states that even from a single measured signal it is possible to
reconstruct the state space that is equivalent to the unknown dynamical system.
129
To reconstruct the state space the time delay r and the embedding dimension D must be
evaluated. The procedure for finding a suitable D is called embedding. Unfortunately, the
embedding theorem does not provide any guidance in choosing the embedding delay T.
The time delay T is the period between the components of the points in the reconstructed
phase space. The time delay T should be chosen so that the coordinates of the vectors
constituting the embedding space are independent, in order to obtain a faithful
reconstruction of the original phase space. In fact if T is too large, the dynamics at one time
step become disconnected from the dynamics at the next time step; consequently, the
components of the vector constituting the embedding space will be uncorrelated. The
dimension of the reconstructed attractor will be close to the dimension of the embedding
space [15] and the attractor will look very complex. This becomes noticeable in the
presence of noise: this case is called irrelevance [16]. If ris too small, all components of the
vector will be nearly the same and the attractor will lie close to the line of identity.
Consequently, all points will be indistinguishable: this case is called redundancy. All of
these cases lead to bad prediction of the chaotic time series.
There are various methods to evaluate the time delay 1:
1. the autocorrelation function,
2. the average displacement method,
3. the mutual information.
The method based on the autocorrelation function C(T) is computationally efficient.
This approach uses the first zero (or a point which is very close to zero) of the
autocorrelation function. The components of the vectors x(t) and x(t+T) are thus
uncorrelated. Unfortunately, some functions do not reach their first zero in a short time or
even do not reach it at all. To avoid this drawback, Zeng advices to take T as the time at
which C(T) first falls to e - 1 /N 2 (where N is the number of points considered in the time
series) [17], while Holzfuss suggests to take T equal to the time at which the autocorrelation
function reaches its first minimum [18]. However, these methods do not usually lead to
good results because component uncorrelation does not coincide with independency.
The average displacement method [19] estimates the optimum expansion of the
reconstructed attractor from the identity line of the reconstructed phase space. To this
purpose the following function is used:
S(m, T) = -
ir + jr) - *(ir)
(16)
where N is the number of points in the time series, m is the dimension of the embedding
space, and T is the time delay. For a given m (m=l, 2,... ), T is varied until a point in which
the function S(m, i) reaches a plateau is found. For each dimension of the embedding space
a time delay can thus be found.
For higher simplicity, the time delay t is usually chosen by using the method of mutual
information [20], derived from the standard information theory [21]. The set of the time
series points is divided into m intervals. The suited number m of intervals is computed by
using the Starjes formula: m ~ Iog2 N + 1~ 3. 32 In N +1, where N is the number of points
in the time series. The length / of each interval is l=(xmiu- x^J/tn, where xmax and xmiin are the
maximum and the minimum values of the time series, respectively. The mutual information
function is defined as:
-ln
(17)
130
where Pi is the probability to observe a value of the time series in the i-th interval, an
is the joint probability that the observed value is located in the /-th interval while the
subsequent observation after the time T falls in the y'-th interval. The function I(T)
characterizes the probability of observing an x(t+T) from the observation of x(t). If the
mutual information is equal to zero, no information about x(t+r) can be extrapolated. This
is equivalent to look for independency of the coordinates' vectors x(t) and x(t+T).
Unfortunately, it is not possible to find a point in which the mutual information function
becomes zero: consequently, the time delay is taken equal to the first minimum of this
function. The first minimum of the mutual information function of the Lorenz X-series is
0. 16 (Fig. 10).
2, 5 T
2
1. 5
1
0, 5
0
0
0, 1
0, 2
0, 3
0, 4
Figure 10: Mutual information l(r) versus T for the Lorenz' s X-series
r-o
Inr
where Cor(r) is the probability that a distance shorter than r separates a pair of randomly
chosen points [23]. For a point x\, X 2 . . . xn in the phase space, Cor(r) is approximated by:
-^)
(20)
Figure 11: Log-log diagram of Cor{n, r) vs. r for the Henon's X-series
The most popular method for estimating the embedding dimension is the False Nearest
Neighbors method [24]: the basic idea is related to the non self-intersection of the
reconstructed attractor. The original attractor in fact lies on a smooth manifold. The selfintersection of the reconstructed attractor proves that it does not lie on a smooth manifold
and, thus, the reconstruction was not correct. For the principle of non self-intersection,
when the attractor is reconstructed successfully in R, then all neighboring points in Rm
should be also the neighbors in Rm+1. The method verifies the neighbors for successively
higher values of the embedding dimension, until only a negligible number of false
neighbors is found when the dimension is increased from m to (m+1). Such m is chosen as
the smallest value of the embedding dimension that produces a reconstruction without selfintersections [25].
Formally, for each point x(t) = [x(t), x(t + t),..., x(t + (m-i)i)] of the time series the
nearest neighbor x(tn) = [x(tn), x(tn +r),..., x(tn +(m-l)T)] is identified
reconstructed phase space of dimension m by using the Euclidean metric:
Rm(t, T)=x(t)-x(tn)\\
II m
in the
(21)
By considering the dimension (m+1), the distance between these points Rm+l(t, r) is
computed. Then, it is:
(22)
132
If Ff is greater than a given heuristic threshold, this point is marked as false nearest
neighbor. By computing the percentage of false nearest neighbors in every dimension
m=l, 2,..., the dimension D having percentage close to zero is identified. This is the
embedding dimension. Fig. 12 shows the diagram of the percentage of false nearest
neighbors versus the embedding dimension m for the Lorenz's X-series. In this case the
time delay T is 0. 16. From this diagram the minimum embedding dimension for the
Lorenz's X-series can be evaluated to be equal to 5, where percentage of false nearest
neighbors is 0. 3%. A more detailed explanation of this method is given in [26] by using the
Gamma test [27].
100
01
3, 4
5
dimension
1,
A, = 1im--ln
(24)
where li(0) and li(t) are the lengths of i-th axis at the initial time and at a time r,
respectively. Therefore every Lyapunov's exponent characterizes the modification of the
principal axis of the ellipsoid. In an n-dimensional chaotic system the sum of the n
Lyapunov's exponents is negative for dissipative systems. The positive exponents are
133
responsible for the sensitivity to initial conditions. The sum of the positive Lyapunov's
exponents is equal to Kolrnogorov's entropy.
As already said, the most common test to verify the chaotic behavior consists of
checking the highest Lyapunov's exponent: the system is chaotic if such an exponent is
positive. When the dynamical system is described by known equations, the highest
Lyapunov's exponent can be easily evaluated, e. g., by using the algorithm given in [28].
Let's consider a dynamical system described by the discrete mapping xn+1 = F(xn),
where x is the state vector, and n is the index for the discrete time. Starting from an
arbitrary point in basin of attraction the mapping is iterated until the obtained point lie on
the attractor. This point is x0. The nearby point is XO = XQ + XQ, where XQ = (
denotes the Euclidean metric). By repeating these operations on the interval Tthe points XT
and XT are derived: the distance vector XT = XT - XT, having length d} = XT |, measures
the distance between these two points. d} / characterizes the variations of the perturbation
vector at the time T. Then, the point XT is assumed as the new point xo and a new point XQ
is taken in the direction of the vector XT so that I|x0 - xo|| = e. By repeating these operations
the new length d2 is obtained. After M steps the factor that modifies the amplitude of the
perturbation is given by:
(25)
e)
The highest Lyapunov's exponent can be therefore estimated as:
At E:
MT
in r
/. in
MTti
(-<-o)
with M large enough. As a consequence of using a large value for M, the computational
complexity becomes high. To limit the computational efforts, the number of iterations
should be smaller. To this purpose the variation of the logarithm of the distance d between
two nearby point x0 and X0= x0 + x0 is computed in time. By taking in account only the
value d<1, the straight regression line is identified and its slope can be computed. This
estimated slope gives the approximate value of highest Lyapunov's exponent. For example,
the highest Lyapunov's exponent for the Henon's system is 0. 418, while for the Lorenz's
system is 0. 906.
Unfortunately, for one-dimensional time series the equations that describe the process
are not known. To compute the highest Lyapunov's exponent a different approach must be
adopted. First of all, the phase space must be reconstructed from the observation x(t) by
using the technique presented above. After having estimated the embedding dimension D
and the time delay T, the lag space [x(t), x(t+T),..., x(t+(D-l)T)] is built. By taking an
arbitrary point in the attractor the nearest point according to the Euclidean metric is
identified among the other lag-points. Then the variations of the logarithm of the distance d
between these two nearby points are evaluated as discussed above. Finally, the highest
Lyapunov's exponent is computed.
The conventional approach to compute X is as follows:
1. Let's start from two points in the basin of attraction that are separated by the distance d0.
Usually, d0 is less the 10-8.
2. Execute one iteration for each orbit and compute the new divergence between the
corresponding trajectories by using the Euclidean metric. Then evaluate In d1.
134
3. Step 2 is repeated for the n points. In d2, In d3,..., and In dn are computed.
4. Plot the diagram of In d versus n.
5. By using the least square method the straight regression line is drawn, by taking into
account only the points having In d < 0. The slope of the regression line estimates the
highest Lyapunov's exponent.
Estimating A. by using this algorithm is -in general- difficult because the initial
divergence d0 is less than 10"*. This approach can be used on experimental data only when
the sequence of data is very long: unfortunately, this is usually very difficult to be achieved
in real cases. To overcome this limit, the neural networks can be effectively adopted to
estimate the highest Lyapunov's exponent.
Another fundamental problem in chaos theory is the computation of the complete
Lyapunov's spectra. Its numerical computation can be performed by means of the algorithm
presented in [29]. For its estimation the exponential growth of the principal axes of the
ellipsoid must be defined.
Let's consider a dynamical system described by n equations. For example, let be n=3.
Let's take any point x0 in the attractor as initial point. Orthonormal frames *Q. ^o'*o are
used for the initial perturbation vectors. After the time T, the trajectory arrives at the point
x, and the perturbation vector becomes x}, y}, z,. The vectors must be reorthonormalized by
using the Gram-Schmidt procedure:
*'=P!I
(27)
~/ ~,
Zl = Z, -(Z,
~ ~o\~o
*, )*!
Then the point x1 and the perturbation vectors xf*, y?, zf* are considered. During the next
time interval T, the new perturbation vectors X2, y2, z2 is obtained. They must be
reorthnormalized again. After M steps, the Lyapunov's exponents can be computed as:
1
..
(27)
M should be rather large. By using this method the Lyapunov's exponents for the Henon's
system are equal to 0.418 and -1. 622, while the ones for the Lorenz's system are 0.906, 0
and -14. 572.
6.7. A neural network approach to compute the Lyapunov's exponents
The use of neural networks for computing the highest Lyapunov's exponent and the
Lyapunov's spectrum was presented in [30]; it relies on the evaluation of the divergence
between two orbits at n step ahead by means of an iterative approach.
The neural network for the highest Lyapunov's exponent is a multilayer network with
Jt >D - 1 input units (where D is the embedding dimension), p hidden units, and one output
unit (Fig. 13). This network is trained by means of the sliding window method:
x(t+n)=F(x(t+(i-1 yt), x(t+(i-2yt),..., x(t+(i-k)i))
for i = l^n
(28)
135
Starting from any point of the state space, this neural network finds the nearest -as
much as desired- attractor. The highest Lyapunov's exponent by using a small data set can
therefore be computed as follows [30]:
l. From the training set a point [x(t), x(t+T),.., x(t+(D-2)T: )] that lies nearby the attractor is
chosen and its trajectory x(t + (D-1)r), x(t + Dr),... is computed by using the
multistep prediction.
2. In the reconstructed phase space the nearby point [x(t), x(t+T),.., x(t+(D-2)f)+d0], where
d0 ~10-8, is selected and its behavior x'(t + (D - !)T), *'(f + DT),... is predicted by
using the neural network.
3. Define In di = ln\x'(t + (D-2 + Z)T)-x(t + (D-2 + Z)T)|, i=1, 2..., and mark the points
for which In di <0.
4. Plot the diagram In d, versus iT.
5. Build the regression line for the marked points and compute its slope, which is equal to
the highest Lyapunov's exponent.
By using this technique the highest Lyapunov's exponent for the Henon's and the Lorenz's
time series are 0. 43 and 0. 98, respectively. Only the X-series has been used in both cases;
the size of the data set was 70 and 100 points, respectively. This result is very close to the
actual values computed in the previous section. This method is highly advantageous as
computational complexity, accuracy, and small data set are concerned. Figg. 14 and 15
represent the diagram of In di versus IT and the straight regression line for the Henon's and
Lorenz's X-series, respectively.
10-]
136
Starting from a given initial condition, this network is able to compute the state of the
dynamical system at any time, as well as to describe the evolution of the phase trajectory
points. At each step the Gram-Schmidt orthogonalization procedure must be used to adjust
the output vector. Let |wi -(0| be the length of the I'-th vector at the time t. This length
characterizes the value of the vector along the i-th ellipsoid axis. Thus, the i-th Lyapunov's
exponent is given by:
s*9L
(29)
The correspondent length |w,. (f)| can be evaluated by using a neural network and,
consequently, the Lyapunov's exponents can be estimated. The algorithm to compute the
complete Lyapunov's spectrum is as follows:
1. Take the initial point N(0)=[x, (0), x^0),..., xj[0)] from the basin of attraction.
2. Choose a small value = 10-8 and define the coordinates of next n points as follows:
A, (0)=(x, (0)+e, xjt)..... xn(t)]
A2(0)=[x, (0), xjt)+e,..., xn(t)]
(30)
(31)
NAJ[0)=[0, 0,.... e]
3. Compute the length of each vector |N4(0)| = \wt (0)| = e, where i = 1, n.
4. At the time t=0, use the set of points N(0), A,(0), AJO),..., Am(0) as the input vector of
the neural network. The output produced by the predicting network is the set of the
coordinates of the points at the next time t=t+l:
)..... xJl, N)]
(32)
..... xm(l, An)],
where xj(l, Aj) is the j-th coordinate of the point Aj at the time t=l. This leads to the next
set of vectors:
N4(l) = w2(l) = (wl2, Wv..... wn2]
(33)
coordinate
of
the
y'-th vector,
having
^1
defined
5, The basis [w/7j, w/7j,... >w/i('7j] is transformed into the orthonormal frame by using the
Gram-Schmidt algorithm, as follows:
a) The first vector of the orthonormal frame is chosen as:
(34)
where |w, (1)| = ^w,2 + w2, +... + w 2 ,.
b) The subsequent vectors are defined by the following recurrent formulas:
05)
where i = 2, n.
c) Compute:
(36)
where i = 1, n.
The result is the new set of points:
tf(l) = [jc, (1, TV), ^:2 (2, A^),..., xn (1, AT)]
A1(1) = [^(1, A1)J2(2, A1),..., ^1(1, A1)]
(37)
An )],
A, =-I>,
Pt=\
(0
(38)
(39)
By using this approach, the Lyapunov's exponents of the Henon's time series are 0.442 and
-1. 625 (the actual values are 0. 418 and -1. 622, respectively). For the Lorenz's time series
they are 0. 777, 0. 003, and -14. 472 (the actual values are 0. 906, 0, and -14. 472,
respectively). Figg. 17 and 18 show the dependence of A, from/? for the Henon's and the
Loren/'s time series, respectively.
138
0, 6
0, 4
r*
a
S
0, 2
-0. 5
3
1 -
o
-0,4
-0, 6
-0, 8 J
-Mi
-2
FigurelT: Estimation of the Lyapunov's spectrum for the Henon's time series
15
10
5
. i^ m o> ui r v r o o in ^- r^
-5
00 ^
" * "I
r^j c*> ^
OOOrv.
in <0 (0 h^
10 ^r
oo 01
-15
-20
-25 -"
-10
fi -10
Jl
-20J
FigurelS: Estimation of the Lyapunov's spectrum for the Lorenz's time series
(40)
where t = k+l, N, F is the nonlinear prediction function, and k is the size of the sliding
window.
The Multilayer Perceptron (MLP) can be effectively adopted for time series prediction,
also for chaotic case. The input layer is composed by at the least (D-l) units (where D is the
embedding dimension), while one output unit delivers the predicted output. The network is
trained by using the known data sequence [x(t), x(t+T)
x(t+(D-2)T)] to generate the
predicted output x(t+(D-1)r). This structure of the predicting network derives directly from
the meaning of embedding. When the time series has been learnt by ("embedded in") the
neural network, in D-dimensional phase space such manifold is obtained that -for every
D-l coordinates of any point- the D-th coordinate is produced and the nearby points in the
D-l dimensions are very close to the D-th coordinate (i. e., the mapping is smooth).
139
To obtain the maximum predictability the embedding parameters must be defined. Let's
consider the Lorenz's and the Henon's attractors as chaotic systems to be modeled. The
Lorenz's attractor is defined by the three-coupled differential equations (10); this system is
chaotic for G=10, r=28, and b=8/3. Equations (10) can be solved by using a 4-th order
Runge-Kutta approach with time step 0. 01; Fig. 4 shows the Lorenz's time series (x-axis).
The mutual information allows for computing 1=0. 16, while the method of the false nearest
neighbors evaluates the embedding dimension D=5. The window size must be k > D-l = 4.
The Henon's attractor is described by the equations (11), where the chaotic behavior occurs
for cc=1. 4 and P=0. 3; Fig. 19 shows the Henon's X-series. By using the same reasoning
discussed above, the windows size is k>2 and 1=1.
-1, 5J
Figure 20: The Henon's process. Prediction results for 30 predicting iterations
by using the retraining approach: (I) prediction, (II) original time series
Figure 21: The Lorenz's process. Prediction results for 30 predicting iterations
by using retraining approach: (I) prediction, (II) original time series
140
To perform forecasting at the level of individual points the MLP can be adopted. A
neural network with 7 input units, 5 sigmoid hidden units, and 1 linear output unit is
verified sufficient to perform this task [30]. Efficient backpropagation is used for training.
By using the iterative approach the Henon's and the Lorenz's data series have been
predicted for 1500 step ahead; the training set consists of 1500 and 930 patterns for the
Henon's and the Lorenz's time series, respectively. Figg. 20 and 21 show the prediction
results on 30 steps ahead for the Henon's and the Lorenz's time series, respectively:
prediction at the level of the individual data points is unreliable. This unpredictability is one
of the main characteristics of a chaotic system.
Prediction can span over a longer time than the individual point. The prediction horizon
is the interval of time in which an accurate forecasting is feasible. As said before chaotic
data are unpredictable on the long term because the measurement error at the initial
condition grows exponentially in time. Since this sensitive dependence is given by a
positive value of the highest Lyapunov's exponent, such a value determines the upper
prediction limit. It is well known that the sum of all positive Lyapunov's exponents is equal
to the Kolmogorov's entropy [31]; consequently, according to the chaos theory, the
prediction horizon is [31]:
-Hi)
where K = \ is trie Kolmogorov's entropy, A, > 0, and dQ is the initial prediction error.
i
According to equation (41), accurate prediction can be achieved only in the range T.
Therefore, after having trained the neural network the prediction horizon for the given
initial point can be computed. Prediction will be performed with such a horizon to ensure
accuracy.
To increase the prediction horizon a suited retraining of the neural network can be
performed. Let's assume that the neural network was trained by using the data set
X={x(l), x(2),.., x(N)}. Prediction will be accurate only for T points ahead:
x(N+l), x(N+2),..., x(N+T). The new training set for retraining is X'=[x(l), x(2),.., x(N+T)}:
this allows for extending the prediction horizon. The effectiveness of this approach has
been tested for the Henon's and the Lorenz's time series. Tables 1 and 2 show the results
achieved with the iterative and the retraining approaches for the Henon's time series. MSE1
and MSE2 are the mean square error for the predicted points x(N+l), x(N+2), x(N+3),
x(N+4) and x(N+5), x(N+6), x(N+7), x(N+8), respectively; MSE is the total mean square
error and NIT is the number of the training iteration. Tables 3 and 4 show similar results for
the Lorenz's time series. The retraining approach usually achieves a better prediction
accuracy than the iterative approach and is effectively able to extend the prediction horizon.
6. 9. State space reconstruction
Let's finally consider the reconstruction of the state space for a chaotic process by using
neural networks, in the presence of a small training data set. Figg. 3 and 5 show the original
Lorenz's and Henon's attractors, respectively.
A multilayer perceptron with 7 input units, 5 hidden units, and 1 output unit has been
used. The training set consists of 100 and 200 patterns for the Henon's and the Lorenz's
time series, respectively. With this neural network, after 3000 training iteration, the mean
square errors for the Henon's and the Lorenz's time series are 0. 00033 and 0. 0008,
respectively. Based on the iterative approach the Henon's and the Lorenz's data have been
Table 1: Iterative approach for training the predictive neural network for the Henon's series
MSE2
NIT
Size of
MSE
T MSE1
Approach
training set
950
308
Iterative approach
3 10-4 4 0. 0002227 0. 0311980
Retraining approach
276
954
3 10-4
0. 0000332
0. 0080427
Table 2: Retraining approach for training the predictive neural network for the Henon's series
Absolute error
Desired value
Actual value
Approach
0. 363170
0. 002451
0. 365621
Iterative approach
0. 009884
1. 002511
0. 992627
0. 023884
-0. 274204
-0. 298088
0. 014724
1. 176354
1. 191078
-1. 026758
-1. 101723
0. 074965
0. 240024
-0. 123019
-0. 363043
0. 158350
0. 670785
0. 512435
0. 333160
0. 524174
0. 191014
0. 363170
0. 364677
0. 001507
Retraining
approach
1. 001295
0. 001216
1. 002511
0. 009313
-0. 298088
-0. 288775
1. 176354
1. 182933
0. 006579
-1. 026758
-1. 046040
0. 019282
-0. 123019
-0. 247162
0. 124143
0. 078702
0. 670785
0. 592083
0. 333160
0. 100966
0. 434126
Table 3: Iterative approach for training the predictive neural network for the Lorenz's series
NIT
MSE
T MSE1
MSE2
Approach
Size of
training set
800 0. 001357 5 0. 0053618 0. 1628954
1000
Iterative approach
805
0. 0014 5 0. 0011142 0.0698684
Retraining approach
578
Table 4: Retraining approach for training the predictive neural network for the Lorenza's series
Desired value
Approach
Actual value
Absolute error
Iterative approach
-0. 155480
-0. 163600
0. 008120
-0. 556713
-0. 617800
0. 061087
-1. 573766
-1. 633100
0. 059334
-0. 536221
-0. 439700
0. 096521
0. 085535
0. 186400
0. 100865
0. 237657
0. 520500
0. 282843
1. 254000
0. 719185
0. 534815
1. 509935
0. 938200
0. 571735
0. 461715
0. 245600
0. 216115
0. 230800
-0. 042810
0. 273610
-0. 169124
Retraining
-0. 163600
0. 005524
approach
-0. 613167
-0. 617800
0. 004633
-1. 598149
-1. 633100
0. 034951
-0. 430258
-0. 439700
0. 009442
0. 121533
0. 186400
0. 064867
0. 317614
0. 520500
0. 202886
0. 940051
1. 254000
0. 313949
1. 336355
0. 938200
0. 398155
0. 301510
0. 245600
0. 055910
0. 011798
0. 230800
0. 219002
141
142
predicted for 1500 step ahead. The predicted Lorenz's and Henon's attractors are shown in
Figg. 22 and 23: the neural network is able to capture the underlying properties of the
chaotic behavior and, therefore, can be used for an accurate reconstruction of the state space
and an accurate prediction of the system behavior.
Figure 22: The predicted Henon's attractor: it was built on 1500 predicting iterations in the embedding space
Figure 23: The predicted Henon's attractor: it was built on 1500 predicting iterations in the embedding space
Let's finally summarize the overall approach to time series processing by using only
observable data. The global purpose of this approach is to identify the chaotic behavior,
predict the time series at level of the individual points, and reconstruct the system
dynamics. The time series of observations is represented by X(t)=(X, (t), X2(t),..., Xf(t)) - or
shortly X(t)-Xi (t) - with t = 1, p. The time series processing approach is as follows:
1. Select any data from a single observable Xi(f), t = 1, p.
2. Compute the embedding delay 1 and take the time series by using this embedding
delay.
3. Compute the minimum embedding dimension D.
4. Build the multilayer perceptron having k > D-l input units, / hidden units, and one
output unit.
5. Prepare the training data:
X(t)=(Xi (t),
_
where X(r) is the input sequence and Y(t) is desire output, for t = 1, p.
6. Train the neural network by using an efficient version of the backpropagation
algorithm.
7. Compute the highest Lyapunov's exponent by using the neural network and identify
the chaotic behavior of the nonlinear system.
143
6. 10. Conclusion
In this chapter the fundamental aspects of chaotic time series processing have been
addressed, namely determination of the embedding parameters, the Lyapunov's spectrum,
forecasting of chaotic data at the level both of the individual data points and the emergent
structure. Both conventional and neural network approaches have been analyzed for chaotic
signal processing. In various domains neural networks have been shown powerful tools
with respect to conventional techniques. The neural approaches allow for evaluating the
Lyapunov's spectrum and for reconstructing the state space accurately and efficiently only
by using the observed data. Besides, the largest Lyapunov's exponent and the Lyapunov's
spectrum can be computed by neural networks even on small data sets; this allows both for
reducing the computationally complexity and for limit the observation time.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
P. Grassberger and I. Procaccia, Measuring the strangeness of strange attractors, Physica D 9, 1983
N. H. Packard, J.P. Crutchfield, J. D. Farmer and R. S. Shaw, Geometry from a Time Series, Physical
Review Letters 45, 1980, pp. 712716.
F. Takens, Detecting strange attractors in turbulence, Lecture Notes in Mathematics, Vol. 898,
Springer-Verlag, Berlin, 1980, pp. 366-381; and in Dynamical System in Turbulence, Warlock, 1980,
eds. D. Rand and L. S. Young.
S. Haykin, Signal processing: Where physics and mathematics neet, IEEE Signal Processing Magazine,
vol. 18, pp. 67, July 2001.
S. Haykin, Adaptive filter theory, 4th Edition, Prentice-Hall, 2001.
S. Haykin, Neural Networks: A comprehensive foundation, Second edition, Prentice-Hall, 1999.
CybenKo G.: Approximation by Superpositions of a Sigmoidal Function, Math Control Signals Syst, 2,
pp. 303314, 1989.
Hertz J. A., Palmer R. G., Krogh A. S., Introduction to the theory of neural computation, AddisonWesley, Redwood City, 1991.
Hornik K., Stinchcombe M., White H., Multi-layer feedforward networks are universal approximators,
Neural Networks, 2 pp. 359-366, 1989.
Waibel A., Consonant Recognition by Modular Construction of Large Phonetic Time-delay Neural
Networks, in Touretsky D.: Advances in Neural Information Processing System, Moggzn Kaufmann,
Los Altos, CA, pp. 215223, 1989.
Donoho D. L., Johnstone I. M., Kerkyacharian C., Ricard D., Wavelet shrinkage: asymptopia? Journal of
the Royal Statictical Society, Series B, 57, pp. 301337, 1995.
Gonzalez R., Wintz P., Digital image processing, Reading, MA: Addison-Wesley, 1987.
A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks,
vol. 13, pp. 411430, 2000.
Oja E., Neural networks, principal components and subspaces, International Journal of Neural Systems,
1, pp. 6168, 1989.
A. M. Albano, J. Muench, C. Schwartz, A. I. Mees and P. E. Rapp, Syngular-Value Decomposition and
the Grassberger-Procaccia Algorithm, Physical Review A 38, 1988, pp. 3017-3026.
M. Casdagli, S. Eubank, J. D. Farmer and J. Gibson, State space reconstruction in present of noise,
Physica D 51, 1992, pp. 5298.
X. Zeng, R. Eykholt and R. A. Pielke, Estimating the Lyapunov-Exponent Spectrum from shot Time
Series of Low Precision, Physical Review Letter 66, 1991, pp. 3229-3232.
144
[18] J. Holzfuss and G. Mayer-Kress, An approach to error estimation in the applications of dimensional
algorithms, in Dimensions and Entropies in Chaotic Systems, editor G. Mayer-Kress, Springer-Verlag,
New York, 1986, pp. 114-122.
[19] M. T. Rosenstein, J. J. Colins, C. J. De Luca, Reconstruction expansion as a geometry-based framework
for choosing proper delay time, Physica D 73, 1994, pp. 82-98.
[20] A. M. Fraser and H. L. Swinney, Independent coordinates for strange attractor from mutual information,
Physical Review A 33, 1986, pp. 1134-1140.
[21] C. E. Shannon and W. Weawer, The mathematical theory of information. University Press, Urbana III.
[22] H. D. I. Abarbanel, R. Brown, J. Sidorovich and L. Tsimring, The analysis of observed chaotic data in
physical systems, Reviews of Modern Physics, Vol. 65, 4, 1993, pp. 1331-1392.
[23] R. Castro, T. Sauer, Correlation dimension of attractor through interspike intervals. Physical Review E
55, 1997.
[24] M. B. Kennel, R. Brown and H. D. I. Abarbanel, Determining embedding dimension for phase-space
reconstruction using a geometrical construction. Physical Review A 45, 1992, pp. 34033411.
[25] D. Kugiumtzis, State Space Reconstruction Parameters in the Analysis of Chaotic Time Series - the
Role of the Time Window Length, 1996.
[26] M. Otani and A.J. Jones, Automated embedding and the creep phenomenon in chaotic time series,
2000.
[27] A. Stefansson, N. Koncar and A.J. Jones, A note on the Gamma test, Neural Computing and
Aplications 5, 1997, pp. 387393.
[28] G. Benettin, L. Galgani, J. -M. Strelcyn, Kolmogorov entropy and numerical experiments, Physical
Review A 14, 1976, pp. 2338-2345.
[29] G. Benettin, L. Galgani, A. Giorgilli, J. -M. Strelcyn, Lyapunov characteristic exponents for smooth
dynamical systems and for Hamiltonian systems: A method for computing all of them. P. I: Theory. P.
II: Numerical applications, Meccanica, Vol. 15, 1980, pp. 930.
[30] V. Golovko, Y. Savitsky, N. Maniakov and V. Rubanov, Some Aspects of Chaotic Time Series
Analysis, Proceedings of the 2nd International Conference on Neural Networks and Artificial
Intelligence, October 25, 2001, Minsk, Belarus, pp. 66-69.
[31] H. Schuster. Deterministic chaos. An introduction. Physic-Verlag, Weinhheim, 1984, p. 240.
\ 45
Chapter 7
Neural Networks
for Image Analysis and Processing
in Measurements, Instrumentation
and Related Industrial Applications
George C. GIAKOS
Department of Electrical and Computer Engineering, The University of Akron
Akron, OH 44325-3904, USA
Kiran NATARAJ, Ninad PATNEKAR
Department of Biomedical Engineering, The University of Akron
Akron, OH 44325-0302, USA
Abstract During the last decade, a significant progress in both the theoretical
aspects and the applications of neural networks on the image analysis, and
processing, has been made. In this paper, basic neural network algorithms as applied
to the imaging process as well their applications in different areas of technology, are
presented, discussed, and analyzed. Novel ideas towards the optimization of the
design parameters of digital imaging sensors utilizing neural networks are presented.
7. 1. Introduction
Digital imaging is a process aimed to recognize objects of interest in an image by utilizing
electronic sensors and advanced computing techniques with the aim to improve image
quality parameters [16]. It contains intrinsic difficulties due to the fact that image
formation is basically a many-to-one-mapping, i. e., characterization of 3-d objects can be
deduced from either a single image or multiple images.
Several problems associated with low-contrast images, blurred images, noisy images,
image conversion to digital form, transmission, handling, manipulation, and storage of
large-volume images, led to the development of efficient image processing and recognition
algorithms. Digital imaging or computer vision involves image processing and pattern
recognition techniques [16]. Image processing techniques deal with image enhancement,
manipulation, and analysis of images. The advantages of digital imaging are shown in
Table 1.
Table 1: Advantages of Digital Imaging
146
Digital image processing methods arise from two principal application areas:
a) improvement of image content for human interpretation and processing, and
b) processing of scene data for machine perception.
Some of their image processing methods include:
i) digitization and compression
ii) enhancement, restoration, and reconstruction, and
iii) matching, description, and recognition.
On the other hand, pattern recognition deals with object identification from observed
pattern and images. In the last few years, significant advances have been made in pattern
recognition, through the use of several new types of computer architectures that utilize very
large-scale integrated circuits (VLSI) and solid state memories with a variety of parallel
high-speed computers, optical and opto-digital computers, as well as a variety of neural
network architectures and implementations. Artificial neural networks have shown great
strength in solving problems that are not governed by rules, or in which traditional
techniques have failed or proved inadequate. The inherent parallel architecture and the
fault tolerant nature of the ANN is maximally utilized to address problems in variety of
application areas relation to the imaging field [10, 11]. Artificial neural networks find their
application in pattern recognition (classification, clustering, feature selection), texture
analysis, segmentation, image compression, color representation and several other aspects
of image processing [2-13], with applications in medical imaging, remote sensing,
aerospace, radars, and military applications [1465].
7. 2. Digital imaging systems
Digital systems with increased contrast sensitivity capabilities and large dynamic range, are
highly desirable [1].
By defining contrast as the perceptible difference between the object of interest and
background, the contrast sensitivity of an imaging system is the measure of its ability to
provide the perceptible difference. It can be an operator dependent or independent
parameter. In this study, the observer independent contrast sensitivity was measured. Also,
it is very important that a detector system is capable to record a wide range of signals
coming off the object. The dynamic range provides quantitative measure of detector's
system ability to image objects with widely varying attenuating structures. It is defined as
the ratio of the maximum signal to the minimum observable image signal. Mathematically,
DR=Smax/ASmin
(1)
where DR is the dynamic range, S^ is the maximum signal from the detector before
saturation or non-linearity occurs and ASmin is the minimum detectable signal above the
noise threshold. Several digital imaging techniques have been developed for a large gamma
of applications, such as aerospace, surveillance, sub terrestrial, marine imaging, and
medical imaging applications.
Applications range from imaging systems in the visible and infrared through x-rays,
MRI, ultrasound, sonar, and radar applications, as shown in Table 2.
as:
-
Hybrid sensors (combination of more than one detector media, such as gas/solid).
The application of the imaging sensors are summarized in Table 3.
Table 3: Imaging Sensors Applications
AREAS
APPLICATIONS
MILITARY
Reconnaissance
Target acquisition
Fire control
Navigation
CIVIL
Law Enforcement
Fire fighting
Borger patrol
MEDICAL ENVIRONMENTAL
INDUSTRIAL
Maintanance, Manufacturing,
Non-Destructive Testing
AEROSPACE
structural
148
PARAMETERS
PHYSICAL PARAMETERS
GEOMETRICAL PARAMETERS
SYSTEM PARAMETERS
OBSERVER EXPERIENCE
ATMOSPHERIC TRANSMTTTANCE
MONITOR PARAMETERS
SCENE CONTENT
Target
characteristics,
characteristics, motion, clutter
MISCELLANEOUS
background
noise,
No single model can be accounted for all the factors listed. Using a model to predict
performance for scenarios where the model is not validated can lead to inaccurate
predictions. Often several techniques are used and the results are combined. For instance
Russo and Ramponi [82] proposed robust fuzzy methods for multisensor data fusion.
Similarly, physiologically motivated pulse coupled neural network (PCNN)-based image
fusion modeling can be used to fuse the results of several object detection techniques, with
applications in mammography and automatic target recognition [77].
7.4. Multisensor image classification
Applications of ANN's towards the classification of multisensor data have been reported in
several works [75, 76]. Multisensor image classification relies on the use of structured
neural networks to the supervised classification of multisensor images. This technique can
be applied in cases where different sensors are used to extract information from the same
image, with applications in remote sensing, medical diagnosis, visual inspection and
monitoring of industrial products, robotics and others. Main problems encountered by
conventional multisensor classification techniques consist of the difficulty to create an
integral multivariate statistical model for different sensors as well as of the absence of
compensatory mechanisms to automatically weight sensors according to their reliability.
149
These problems can be easily overcome by utilizing ANN's, since ANN'S they do not
require a-priori knowledge of statistical data distribution, as well as they take into
consideration the reliability of each sensor. A multi-input single-output 'tree-like-networks
(TLNs), aimed to overcome the difficulties related to the architecture definition, and
opacity, have been proposed [77]. The neural network architecture is shown in Fig. 1.
Im
Figure 1 TLN is dedicated to each class of data; the final classification is provided
by a Winner-Takes-All block [77].
150
Figure 2: Block diagram of Tree-like Networks applied for multisensor classification problems [77].
They have shown the robustness of this approach when highly degraded partial images
rapidly converged to the closest stored image. However this research has not addressed the
issue of shift and rotational variance. They conclude that methods involving data
preprocessing is the most viable option.
Several researchers have developed high performance image classification systems
based on ensemble of neural networks [814]. Most of the research has shown that the
ensemble of neural networks work best when the neural networks forming the ensemble
make different errors. Giacinto et al. [9] have improved on these models by using an
automated design to arrive at the best ensemble of neural networks for pattern
classification. Their method not only showed the effectiveness of their approach in image
classification but also provided a systematic method in choosing neural. The Kohonen
network (Fig. 3) provides advantage over classical pattern recognition techniques because it
utilizes the parallel architecture of a neural network and provides a graphical organization
of pattern relationship.
151
LI input
layer
Figure 3: A two-layer network. (Kohonen learning).
VI
Layer 4B
V2
!/
vs
Orientation
Direction
Orientation
MOTION
^V
\
Orientation
152
pathways, the parvocellular pathway and the magnocellular pathway. The former pathway
processes color information, while, the later processes form and motion. The entry point of
an image is retina, while the area marked LGN models the biological lateral geniculate
nucleus. The areas of the model labeled with the letter V model specific areas in the human
visual cortex, while the numbers indicate specialty areas which process selective
information such as color, form, or motion. Overall, this model exceeds the accuracy
obtained by individual filtering methods.
7.6. Image shape and texture analysis
Many studies in the area of image processing are devoted to shape and texture analysis
[15,16], [18,19], Ferrari et al. [15], used both shape and texture features from original
regions of interest from images to classify early breast cancer, which are associated with
microcalcifications. They implemented different topologies of ANN and used the receiver
operating characteristic approach to analyze the performance of the ANN. The percentage
of correct diagnosis, either benign or malignant, was over 85%.
An adaptive neural network model [74] for distinguishing line and edge detection from
texture presentation, for both biological and machine vision applications, is shown in Fig.
5. The model provides different representations of a retinal image in a way that line or
edges are distinguished from textures. Specifically, an hierarchy of adaptive Artificial
Neural Network (ANN) modules, the so called Entropy Driven Neural Network (EDANN)
modules, is introduced for performing two essential different tasks, namely, line and edge
detection, and texture segregation. The texture segregation pathway is defined by the
EDANN1-, EDANN2, and EDANN3 modules, while, the EDDAN1+ and the EDANN4
modules define the line-and edge detection pathway.
texture boundary detection output
texture boundary
detection
filling-in
EDANN 1+
EDANN 1-
orientation
extraction
EDANN
filtering
Energy maps
Retinal image
Figure 5: Simplified block-diagram of the model.
153
I
Original
Reconstructed
image
Decompress
Compress
Image
Compressed
N neurons
M neurons
neurons M
Panagiotidis et al. [64] have used a neural network approach for lossy compression of
medical images (Fig. 8). They differentially code regions of interest in contrast to the rest
of image areas to achieve high compression ratios. Specifically, the authors have developed
an efficient coding and compression scheme, which takes into consideration the difference
154
in visual importance between areas of the same image, by coding with maximum precision
regions of interest (ROI), while performing a lossy reconstruction of the low-interest areas.
A diagram of the hierarchical network used to classify the difference in visual importance
between areas, is shown in Fig. 9.
Block
DCT
Edge Detection
Homogeneous
Neural Network
Low
High / Low
Importance
Classification
Network
High
Quantization
Tables
Definition
xp
ummation
nit
155
(2)
is the gain factor. Including a momentum term,
(3)
where
The convergence speed is critically dependent on the gain parameter
and the
momentum Fixed the value of is decreased during learning according the speed of
convergence. The learning is eventually stopped when no further improvement is obtained
in the performance of the NN and | has reached a predefined minimum.
This solution allows a fast convergence during the first part of the learning, and
successive accurate approaching to the minimum.
7.9. Linear neural networks for image compression
A 2-layer perceptron can be used, the same as in the previous section, but no nonlinearity is
present at the nodes output.
The original images are fed into the input layer and the principal components of the set
of images are obtained at the output layer, so that a basis which corresponds to the
Karhunen-Loeve Transform (KLT) is determined.
Interestingly enough, given a set of images, the most powerful linear technique is the
KLT Transform. In this case, a basis for the linear space mapped by the images is found, in
which the basis vectors are ordered according to their importance, so the energy preserved
in the remaining coefficients is minimized (the base is restricted as in the case of the image
compression problems).
7.10. Image segmentation
Image segmentation provides a means for evaluating the association of a particular pixel to
an object of interest within an image. Image segmentation aids in analysis of shape of
objects and edges. By segment we imply the labeling of the image at every voxel with the
correct anatomical descriptor.
Some applications are:
- magnetic resonance,
- computed tomography,
- surgical planning,
- radiation therapy.
156
Artificial neural networks have been used as a tool for image segmentation in the field
of echocardiography [20,22,24], showed that segmented images preserved better the heart
structure at the cost of higher fragmentation of the image. They showed that segmented
images had sufficient details of the anatomy of the heart to allow medical diagnosis.
Ahmed and Farag, 1997, using neural networks have shown that neural networks yield
accurate results by better extraction of the 3D anatomical structures of the brain [21]. Also,
they claim that their technique could be adapted to real-time application of image analysis.
Other researchers have used neural networks as an effective tool for image segmentation
[24-26] with emphasis on MRI.
7.11. Image restoration
Image restoration addresses the problem of retrieving the source image form its degraded
version. Considerable amount of research has focused on image restoration [4652]. Perry
and Guan [47] have used ANN model for image reconstruction with an apriori edge
information to recover the details and reduce ringing artifact of subband-coded image.
Their approach is particularly suitable for high contrast images and also has a great
potential for implementation in real time. Qian and Clarke [52] have developed a novel
wavelet-based neural network with fuzzy-logic adaptivity for image restoration. Their
objective was to restore degraded images due to photon scattering and collimator photon
penetration that are common when using a gamma camera. They showed that their
approach is efficient in restoring the degraded image and also more efficient by a factor of
4-6 compared to an order statistic neural network hybrid model. The restored images were
smoother, with less ringing artifacts and better defined source boundaries. Also, their
model was stable under poor signal to noise ratio and low-count statistics. In addition, an
adaptive neural network filter for removal of impulse noise in digital images has been
reported. It provides a detailed statistical analysis of their approach in contrast with the
traditional median-type filters for removal of impulse noise. Their results demonstrate their
ability to detect the positions of noisy pixels and also that their approach outperforms the
traditional median-type filters.
7.12. Applications
7.12.1 Military applications
Image processing coupled with ANN find usefulness in determining aircraft orientation,
tracking (localization), and target recognition [4143]. Rogers et al. [42] have explored the
use of ANN for automatic target recognition (ATR) and have shown it to be an interesting
and useful alternate processing strategy. Agarwal and Chaudhuri [41] obtained a set of
spatial moments to characterize the different views of the aircraft corresponding to the
feature space representation of the aircraft. The feature space is partitioned into feature
vectors and these vectors are used to train several multi-layer perceptrons (MLP) to develop
functional relations to obtain the target orientation. They show that training of several
MLPs provide a better analysis of aircraft orientation when compared to a single MLP
trained across the entire feature space. Liu et al [65] have used two-layered ANN for
extracting hydrographic objects from satellite images. They have shown that the neural
network approach preserves boundaries and edges with high accuracy with while greater
suppression of noise within each region.
157
Super-resolution techniques are aimed to obtain an image with a resolution higher than
that allowed by the imaging sensor, with applications in areas such as surveillance and
automatic target recognition. In a two-step procedure, a super-resolved image is obtained
through the convolution of a low-resolution test image with an established family of kernels
[79]. The proposed architecture for super-resolving images using a family of kernels, is
shown in Fig. 10:
Arrange Superresolved
Neighborhoods into
image
Superresolved
image
The low-resolution image neighborhoods are partitioned into a finite number of clusters,
where the neighborhoods within each cluster exhibit similarities. Then, a set of kernels,
implemented as linear associative memories (LAM's) can be developed which optimally
transform each clustered neighborhood into its corresponding neighborhood [79].
After the low-resolution images is synthesized the training the super-resolution
architecture proceeds according to Fig. 11:
High Resolution
image
158
ATM-related
subnet
159
Similarly, is important to extract the features from Doppler echo information of moving
target indication (MTI) radar and to recognize radar moving target by the statistical method
of pattern recognition [6]. Imaging parameters of interest:
- spatial resolution,
- spectral resolution.
- Combination of neural an statistical algorithms for supervised classification, have been
utilized effectively [2,6,9].
Based on the multisensor image classification by structured neural network principles
[77], presented in section 7.4, a tree-like network used to analyze and process data obtained
through a multisensor remote-sensing imager is shown in Fig 12. The multisensor remotesensing imager consists of a Daedalus 1268 Airbom Thematic Mapper (ATM) scanner,
together to a multiband, fully polarimetric, NASA/JPL imaging synthetic aperture radar
(SAR). The imager system and the accompanying network architecture has been use to
analyzed imges related to the agricultural fields. Specifically, the selected imaging pixels
were representing five different agricultural fields. For each feature, a feature vector was
computed by utilizing the intensity values in six ATM bands, and nine features were
extracted from the SAR images.
7.12.3 Nuclear magnetic resonance spectroscopy
Nuclear magnetic resonance (NMR) spectroscopy is used as a non-invasive tool for tissue
biochemistry and diagnosis of tissue abnormalities be it focal lesions or tumors [2], [2540].
Artificial neural network approach has been used as an effective tool in NMR spectral
characterization. Specifically, important steps in analyzing MRI and CT is segmentation,
i.e., pixels are labeled with terms denoting types of tissue.
Figure 13: Block diagram the adaptive recurrent neural network processor.
By means of the adaptive recurrent neural network processor, shown in Fig. 13, detailed
topographical properties and symmetries in MRI can be studied.
The accurate and reproducible interpretation of an MRI remains an extremely time
consuming and costly task. MRI scans allows measurements of three tissue -specific
parameters:
- the spin-spin relaxation time (T2)
- the spin-lattice relaxation tissue (Tl) and,
- the proton density.
Each pixel is represented by 3-d vector.
160
Several research groups have used ANN's to differentiate between benign and malignant
tissue [29-35], specifically:
- El-Deredy and Branston [36], classified sites of high toxicity from high resolution urine
spectra;
- Anthony et al. [34], classified thyroid neoplasms [35];
- classification of high and low grade gliomas [37],
- quantification lipoprotein lipids [38,39], and classify muscle disease [40].
7.12.4 Mammography
Based on the discussion of section 7.5, PCNN fusion architecture used to fuse breast cancer
is presented in Fig. 14:
External
linking
External
linking
Linking
Linking
Figure 14: PCNN fusion architecture used to fuse breast cancer and FLIR images [78].
Object detection is performed by means of PCNN fusion networks that take an orginal
and several unfiltered versions of a gray scale image and outputs of a single image in which
the desired objects are the brightest and then easily detected. Each PCNN has one neuron
per input image pixel, while the pulse rate of each neuron in the center PCNN is used as a
brightness value for the pixels in the output image.
7.13. Future research directions
Flat-panel digital detectors are being developed for radiological modalities such as
radiography and fluoroscopy [6673]. These systems comprise large area pixel arrays which
use matrix addressing to read out charges resulting from x-ray absorption in the detector
medium. There are two methods for making flat panel image sensors. In one method, the
indirect method [1], a phosphor converter absorbs the incident x-rays and emits visible light
which is converted by an a-Si:H p-I-n photodiode to an electronic image. The signal is read
out by utilizing a thin film transistor (TFT) readout array. Alternatively, various diode
switching modes can be serve as electronic readout. However, the diode readout exhibits a
161
strong nonlinearity and large charge injection. Overall, the indirect method is inefficient
and can lead to increased image noise, particularly when signals are low. The other
approach, the direct method [1] uses a photoconductive layer to absorb x-rays and collect
the ionization charge which is subsequently read-out by an active matrix array. Lead iodide
(PbI2), cadmium zinc telluride (CdZnTe) [67,68], and amorphous selenium, (a-Se) are good
candidates. The direct method has a higher intrinsic resolution compared to the indirect
method because it avoids the x-ray to light conversion stage. However, poor transport
characteristics, associated with the slow motion of ions and the presence of impurities in
CdZnTe detectors, can compromise the otherwise excellent detector performance.
Future directions of NN research in digital radiography or more generally in digital
electronic sensor design, should be include the optimization of detector parameters [73],
such as:
- collection efficiency
- space charge
- charge-carrier trapping-detrapping
- electric field non uniformity
- detector medium aging or impurities
- electron-hole recombination
- radiation scattering
- multipath detection-parallax effects.
In a first step, the design of digital sensors would be optimized by means of NN
algorithms, trained to classify extract and classify intrinsic detector signal parameters such
as amplitude, rise time and fall time, transit time, signal dispersion and distortion, and
signal-to-noise ratio (SNR) characteristics (Fig. 15). As a result, enhanced image quality,
by removing nonlinearities, noise, and multipart! detection effects would be achieved [73].
Figure 15: Neural network classifier for digital sensor design optimization.
162
the memorized patterns are not equilibria but synchronized oscillatory states in which
neurons fire periodically, establishing a relationship between their phases.
Output
Signal
Phase-Locked Loop.
163
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
G.C. Giakos, "Key Paradigms of Emerging Imaging Sensor Technologies", IEEE Transactions on
Instrumentation and Measurement, vol. 40, No. 6, pp. 1-9, December 1998, (invited paper).
A. D. Kulkarni, "Computer Vision and Fuzzy-Neural Systems", Prentice Hall, 2001.
A. D. Kulkarni, "Artificial Neural Networks for Image Understanding", ITP, 1994
R. Ritter, and J. N. Wilson, "Computer Vision Algorithms in Image Algebra", CRC, 2001.
L. M. Fu, "Neural Networks in Computer Intelligence", McGraw-Hill, 1994.
T. Suzuki, H. Ogura, and S. Fujimura, "Noise and Clutter Rejection in Radars and Imaging Sensors",
Proc. Of the Second International Symposium on Noise and Clutter Rejection in Radars, IEICE, 1990.
F. Russo, "Evolutionary Neural Fuzzy Systems for Data Filtering", IEEE Instrumentation and
Measurements Technology Conference Proceedings, pp. 826-831, 1998.
R. Battiti, and A.M. Colla, Democracy in neural nets: voting schemes for classification, Neural
Networoks v. 7, pp. 691707, 1994
G. Giacinto, and F. Roli, Ensembles of neural networks for soft classification of remote sensing
images, Proceedings of the European Symposium on Intelligent Techniques, Bari, Italy, pp. 166-170,
1997
G. Giacinto, F. Roli, and L. Bruzzone, Combination of neural an statistical algorithms for supervised
classification of remote-sensing images, Pattern Recognition Letters v. 21, n. 5, pp. 385397, 2000
T.K. Ho, J.J. Hull, and S.N. Srihari, Decision combination in multiple classifier systems, IEEE
Transactions on Pattern Analysis and Machine Intelligence n. 18, pp. 6675, 1994
Y.S. Huang, K. Liu, and C.Y. Suen, A method of combining multiple experts for the recognition of
unconstrained handwritten numerals, IEEE Transactions on Pattern Analysis and Machine Intelligence,
n. 17, pp. 9094, 1995
J. Kittler, M. Hatef, R.P.W. Duin, and J. Matas, On combining Classifiers, IEEE Transactions on
Pattern Analysis and Machine Intelligence, n. 20, pp. 226239, 1998
L. Xu, A. Krzyzak, and C.Y. Suen, Methods for combining multiple classifiers and their applications
to handwriting recognition, IEEE Transactions on Systems, Man, and Cybernetics, n. 22, pp. 418435,
1992
RJ. Ferrari, A.C.P.L.F. de Carvalho, P.M. Azevedo Marques. A.F. Frere, Computerized classification
of breast lesions: shape and texture analysis using artificial neural network Image processing and its
application, Conference publication, n. 465, pp. 517521, 1999.
L. Shen, R.M. Rangayyan and J.E.L. Desautels, Application of shape analysis to mammographic
calcifications, IEEE Transactions on Medical Imaging n. 13, pp. 263274, 1994
W.G. Wee, M. Moskowitz, W.C. Chang, Y.C. Ting, and S. Pemmeraju, Evaluation of mammograhic
calcifications using a computer program, Radiology, n. 110, pp. 717720, 1975
H.P. Chan, K. Doi, S. Galhotra, C.J. Vyborny, H. MacMahon, and P.M. Jokich, Image feature analysis
and compute-aided diagnosis in digital radiography. L Automated detection of microcalcifications in
mammography, Medical Physiology, n. 14, pp. 538548, 1987
R.M. Haralick, K. Shanmugam, I. Dinstein, Texture features for image classification, IEEE
Transactions on Systems, Man, and Cybernetics, n. 3, pp. 610621, 1973
L. Piccoli, A. Dahmer, J. Scharcanski, and P.O.A. Navaux, Fetal echocardiographic image
segmentation using neural networks, Image processing and its application, Conference publication, n.
465, pp. 507511, 1999.
M.N. Ahmed, and A. A. Farag, Two-stage neural network for volume segmentation of medical images,
IEEE Transactions on medical Imaging, pp. 1373-1378, 1997.
M. Sussner, T. Budil, and G Porenta, Segmentation and edge-detection of echocardiograms using
artificial neuronal networks, EANN.
M. Belohlavek, A. Manduca, T. Behrenbeck, J.B. Seward, and F. Greenleaf Image analysis using
modified self-organizing maps: Automated delineation of the left ventricular cavity boundary in serial
echocardiograms, VBC, n. 1131, 247252, 1996
S. Haring. M. Viergever, and K. Kok, A multiscale approach to image segmentation using kohonen
networks, Proceedings IPMI, Berlin, pp. 212224, 1993.
S.C. Amartur, and Y. Takefuji, Optimization on neural netwoks for the segmentation of MRI images,
IEEE Transactions on Medical Imaging, v 11, n. 2, pp. 215220, 1992
X.Li, S. Bhide, and M.R. Kabuka, Labeling of MRI brain images using Boolean neural network, IEEE
Transactions on Medical Imaging, v. 15, pp. 628638, 1996.
M.N. Ahmed and A. A. Farag, 3D segmentation and labeling of CT brain images using self organizing
kohonen network to quantify TBI recovery, Proceedings from the IEEE Engineering in Medicine and
Biology Society (EMBS) conference, Amsterdam 1996.
D.G. Gadian, "NMR and its Application to Living Systems" Oxford Science Publication, Oxford, 1995
M.L. Aston and P. Wilding, Application of neural networks to the interpretation of laboratory data in
cancer-diagnosis. Clinical Chemistry, n. 38, pp. 34-38, 1992
164
[30] S.L. Howells, R.J. Maxwell, A.C. Peet, and J.R. Griffiths, An investigation of tumour 1H nuclear
magnetic resonance spectra by the application of chemometric tenchniques, Mag. Reson. Med, n. 28,
pp. 214-236, 1992.
[31] N.M. Branstom, R.J. Maxwell, and S.L. Howells, Generalization performance using backpropogation
algorithms applied to patterns derived from tumour 'H-NMR spectra. Journal of Microcomputer
applications, n. 16, pp. 113123, 1993
[32] S.L. Howells, R.J. Maxwell, F.A. Howe, A.C. Peet, and J.R. Griffiths, Pattern recognition of 31P
magnetic resonance spectroscopy tumour spectra obtained in vivo. NMR in Biomedicine, n.6, pp. 237241, 1993
[33] P.J.G. Lisboa, and A.R. Mehriehnavi, Sensitivity methods for variable selection using the MLP,
Proceedings International Workshop on Neural Networks for Identification, Control, Robotics and
Signal Processing, pp.330338, 1996
[34] M.L. Anthony, V.S. Rose, J.K. Nicholson, and J.C. Lindon Classification of toxin-induced changes in
1H NMR spectra of urine using artificial neural network, Journal of Pharmaceutical Biomedical
Annals, n. 12, pp. 205211, 1995
[35] R.L. Somorjai, A.E. Nikulin, N. Pizzi, D. Jackson, G. Scarth, B. Dolenko, H. Gordon, P. Russell, C.L.
Lean, L. Delbridge, C.E. Mountford and I.C.P. Smith, Computerized consensus diagnosis: A
classification strategy for the robust analysis of MR spectra. I. Application to 1H spectra of thyroid
neoplasms, Magnentic Resonance Med, n. 33, pp. 257263, 1995
[36] W. El-Deredy, and N.M. Branston, Identification of relevant features of 'H MR tumour spectra using
neural networks, Proc. IEEE Int Conf on Artificial neural networks, pp. 454459, 1995
[37] N.M. Branston, W. El-Deredy, A.A. Sankar, J. Darling, S.R. Williams, and D.G.T. Thomas, Neural
network analysis of 1 H-NMR spectra identifies metabolites differentiating between high and low grade
astrocytomas in vitro, J. Neuro-Oncology, n28, pp. 83, 1996
[38] Y. Hiltunen, E. Heiniemi, and M Ala Korpela, Lipoprotein lipid quantification by neural network
analysis of 1HNMR spectra from human plasma, J. Mag Reson Series B, n. 106, pp. 191194, 1995
[39] M. Ala Korpela, Y. Hiltunen, and J.D. Bell, Quantification of biomedical NMR data using artificial
neural network analysis: Lipoprotein lipid profiles from 1H NMR data of human plasma, NMR
Biomed, n. 8, pp. 235244, 1995
[40] S. Kari, N.J. Olsen, and J.H. Park, Evaluation of muscle disease using artificial neural network analysis
of 3 IP MR spectroscopy data, Mag. Res. Med, n. 34, pp. 664672, 1995.
[41] S. Agarwal, and S. Chaudhuri, Determination of aircraft orientation for a vision-based system using
artificial neural networks, Journal of Mathematical Imaging and Vision, n. 8, pp. 255269, 1998.
[42] S.K. Rogers, J.M. Colombi, C.E. Martin, J.C. Gainy, K.H. Fielding, T.J. Bums, D.W. Ruck, M.
Kabrisky, and M. Oxley, Neural networks for automatic target recognition Neural Networks, n. 7/8, v.
8, pp. 11531184, 1995.
[43] S. Shams, Neural network optimization for multi-target multi-sensor passive tracking. Proceeding of
the IEEE, Special issue on Engineering Application of Artificial Neural Networks, v. 84, n. 10,
pp. 14421458, 1996
[44] A.K. Katsaggelos, and R.M. Mersereau, A regularized iterative image restoration algorithm, IEEE
Transactions on Signal Processing v. 39, n. 4, pp. 914929, 1991
[45] P. Bao, and D. Wang, An edge-preserving image reconstruction using neural network. Journal of
Mathematical Imaging and Vision, v. 14, pp. 117130, 2001
[46] J. Paik, and A. Katsaggelos, Image restoration using a modified Hopfield network, IEEE Transactions
on Image Processing, v. 1, n. 1, pp. 4963, 1992
[47] S. Perry, and L. Guan, Neural network restoration of images suffering space-variant distortion.
Electronics Letters, v. 31, n. 16, pp. 13581359, 1995
[48] S.W. Perry, and L. Guan, A statistics-based weight assignment in a hopfield neural network for
adaptive image restoration, IEEE, pp. 922-927, 1998
[49] Y. Yang, N.P. Galatsanos, and A.K. Katsaggelos, Regularized reconstruction to reduce blocking
artifacts of block discrete cosine transform compressed images, IEEE Transactions on Circuits and
Systems for Video Technology, v. 3, n. 6, pp. 421432, 1993
[50] Y. Zhou, R. Chellappa, A. Vaid, and B. Jenkins, Image restoration using neural network, IEEE
Transactions on Acoustics speech, Signal Processing, v. 36, n. 7, pp. 1141-1151, 1988
[51] W. Qian, and L.P. Clarke, Wavelet-based neural network with fuzzy-logic adaptivity for nuclear image
restoration. Proceedings of the IEEE, v.84, n. 10, pp. 14581473, 1996.
[52] A.N. Netravali, and J.O. Limb, Picture coding: A review. Proceeding of IEEE, v. 68, pp. 366406,
1980
[53] A.K. Jain, Image data compression: A review, Proceedings of IEEE, v. 69, pp. 349389, 1981
[54] N.S. Jayant, and P Noll, Digital coding of waveforms, Englewood Cliffs, NJ, Prentice-Hall 1984
[55] A.N. Netravali, and B.G. Haskell, Digital pictures: Representation and Compression, New York:
Plenum 1988
165
[56] A. Gersho, and R.M. Gray, "Vector Quantization and Signal Compression", Norwell, MA: Kluwer
1992
[57] N. Jayant, J. Johnston, and R Safranek, Signal compression based on models of human perception,
Proceedings of IEEE, v. 81, pp. 13851421, 1993
[58] R.D. Dony, and S. Haykin, Neural network approaches to image compression, Proceedings of IEEE, v.
83, n. 2, pp. 288303, 1995
[59] L.E. Russo, and E.G. Real, Image compression using an outer product neural network, Proceedings of
IEEE Int. Conf. Acoust. Speech and Signal Process, pp. II 377-389, 1992
[60] A. Namphol, M. Arozullah, and S. Chin, Higher order data compression with neural networks, Proc Int
Joint Conf on neural Networks, pp, 15559, 1991
[61] R. Kohno, M. Arai, and H. Imai, Image compression using a neural network with learning capability of
variable function of a neural unit, SPIE v 1360, Visual Commun and Image Proc, pp. 6975, 1990
[62] D. Anthony, E. Hines, D. Taylor, and J. Barham, A study of data compression using neural network
and principal component analysis, Colloquium on Biomedical Applications of Digital Signal
Processing, pp. 1-5, 1989
[63] G.L. Sicuranzi, G. Ramponi, and S. Marsi, Artificial neural network for image compression, Electronic
letters, v. 26, pp. 477479, 1990
[64] N. G. Panagiotidis, D. Kalogeras, S.D. Kollias, and A. Stafylopatis, Neural network-assisted effective
lossy compression of medical images, Proceedings of IEEE, v.84, n. 10, pp. 14741487, 1996
[65] X. Liu, D. Wang, and J.R. Ramirez, Extracting hydrographic objects from satellite images using a two
layered neural network, IEEE, pp. 897-902, 1998
[66] C.E. Cann et.al., "Quantification of Calcium in Solitary Pulmonary Nodules Using Single-and Dual
Energy CT", Radiology, vol. 145, pp. 493, 1982.
[67] G.C. Giakos, A. Dasgupta, S. Suryanarayanan, S. Chowdhury, R. Guntupalli, S. Vedantham, B. Pillai,
and A. Passalaqua, "Sensitometric Response of CdZnTe Detectors for Chest Radiography", IEEE
Transactions on Instrumentation and Measurement, vol. 47, no. 1, pp.252255, 1998.
[68] G.C. Giakos, S. Vedantham, S. Chowdhury, Jibril Odogba, A. Dasgupta, S. Vedantham, D.B. Sheffer,
R. Nemer, R. Guntupalli, S. Suryanarayanan, V. Lozada, R.J. Endorf, and A. Passalaqua, "Study of
Detection Efficiency of CdZnTe Detectors for Digital Radiography", IEEE Transactions on
Instrumentation and Measurement, vol. 47, no. 1, pp. 244251, 1998.
[69] G.C. Giakos, and S. Chowdhury, "Multimedia Imaging Detectors Operating on Gas-Solid State
lonization Principles", IEEE Transactions on Instrumentation and Measurement, vol. 40, No. 5, pp. 19, October 1998.
[70] G.C. Giakos, US Patent, 6,207,958, "Multimedia Detectors for Medical Imaging", March 23, 2001.
[71] G.C. Giakos, US Patent 6, 069, 362, on "Multidensity and Multi-atomic Number Detector Media for
Applications", May 30, 2000.
[72] G.C. Giakos, European Patent 99918933.52213, on "Multidensity and Multi-atomic Number Detector
Media for Applications", December 28, 2000.
[73] G.C. Giakos, NATO Advanced Research Institute, Lecture Series, NIMIA 2001, Crema, Italy, 9-20
October 2001.
[74] M.M. Van Hulle, T. Tollenaere, and G.A. Orban, "An Adaptive Neural Network Model for
Distinguishing Line-and Edge Detection from Texture Segregation", International Joint Conference on
Neural Networks, Singapore, 18-21 November, pp. 1409-1414, 1991.
[75] O.K. and D. Hong, "Parallel, Self-Organizing, Hierarchical Neural Networks", IEEE Transactions on
Neural Networks, vol. 1., No. 2, pp. 167-178, 1990.
[76] H. Bishcof, W. Schneider, and A.J. Pinz, "Multispectral Classification of LandSat-images using Neural
Networks", IEEE Transactions on Geoscience and Remote Sensing, vol. 30, no. 3., pp. 482490, 1992.
[77] F. Roli, S.B. Serpico, and G. Vernazza, "Multisensor Image Classification by Structured Neural
Networks", IEEE Trans. On Geoscience and Remote Sensing, vol. 28, no. 4, pp. 310320, 1993.
[78} R. P. Broussard, S.K. Rogers, M.E. Oxley, and G.L. Tarr, "Physiologicaly Motivated Image Fusion for
Object Detection using a Pulse Coupled Neural Network", IEEE Transactions on Neural Networks,
vol. 10., No. 3., pp. 554562, 1999
[79] F.M. Candocia, and J.C. Principe, "Super-Resolution of Images Based on Local Correlations", IEEE
Transactions on Neural Networksw, vol. 10., no. 2., pp. 372380, 1999.
[80] T. Aoyagi, "Network of Neural Oscillators for Retrieving Phase Information", Phys. Rev. Lett., Vol.
74., pp. 40754078, 1995.
[81] F. C. Hoppensteadt, and E. M. Izhikevich, "Pattern Recognition via Synchronization in Phase-Locked
Loop Neural Networks", IEEE Transactions on Neural Networks, vol. 11, No. 3, 2000.
[82] F. Russo, and G. Ramponi, "Fuzzy methods for Multisensor Data Fusion", IEEE Transactions on
Instrumentation and Measurements", vol. 43, n.2, pp. 288294, 1994.
167
Chapter 8
Neural Networks
for Machine Condition Monitoring
and Fault Diagnosis
Robert X. Gao
Department of Mechanical and Industrial Engineering, University of Massachusetts
Amherst, MA 01003, USA
Abstract. This chapter introduces several fundamental aspects of neural networks and
their applications in the industry, in particular for machine condition monitoring and
fault diagnosis. Several research highlights in bearing condition monitoring and health
assessment using neural networks are presented
168
Two major issues concerning machine condition monitoring are machine fault diagnosis
and prognosis. Diagnosis refers to the determination of the current "health" status or
working condition of the machine being monitored, whereas prognosis refers to the
prediction of the remaining service life in the machine. Reliable diagnosis and prognosis
techniques not only reduce the risks of unexpected machine breakdowns, but also help in
prolonging machine life. Due to these reasons, the current trend in the maintenance industry
is increasingly shifted towards condition-based, preventative, and proactive maintenance.
8.1.1 State of Knowledge
In the machine tool industry, condition-based monitoring has been manifested through the
monitoring of the overall machine system (e.g. total energy consumption), the specific tools
(wear or lubrication status), the work piece (quality parameters), and the machining
processes (e.g. chip formation or temperature variation). The fault condition of a machine is
judged by symptoms and signs, which are generally related to the operation parameters.
The variation in time of these parameters is an indicator of the fault progression and can be
used to forecast the future trend of its development, as well as serving as the basis for
generating alarm signals. Among the various symptoms used, machine vibration has long
been used as a practical fault indicator [78]. Most machinery equipment consists of
bearings, gears, motor, shafts, and other rotating elements, and vibration caused by the
presence of structural faults in these components provides a source of information of the
machine health condition, since the vibration profile of the machine would change as the
fault develops. Such a change could be reflected by an increase in the vibration level of
characteristic frequencies. The fundamental issues in condition monitoring include: 1)
identification of the fault pattern, and 2) quantification of the fault development. The
physical variables that can be measured for the vibration analysis include displacement,
velocity, or acceleration. It is important to specify the frequencies at which the vibration
levels become critical for the type of machinery being monitored. The measured data set,
which is representative of a particular fault, is extracted for features by suitable signal
processing techniques.
Historically, the identification of a faulty machine or machine components was made by
comparing the sound emitted by the machine to that from a "healthy" machine in good
working condition [9]. But this approach lacks objectivity, is vulnerable to ambient noise
and is subject to human errors [10]. Other methods used have included the acoustic
emission (AE) signals, which is associated with the transient elastic waves generated by
sudden release of strain energy. Such energy release is basically due to stress
concentrations, which can be caused by the presence of structural defects such as cracks.
Applications of sub-surface defect diagnosis using AE techniques have been reported in
[1112]. General difficulties with AE-based measurement involve quantification of the
relatively low AE signal magnitudes, and noise contamination from other machine
structures [13]. AE techniques have been applied for tool breakage detection [3]. Surveys
have also revealed extensive use of AE sensors coupled with force sensors for tool wear
monitoring. Furthermore, temperature measurement has been used as an indirect technique
in conjunction with vibration analysis for tool condition monitoring. The advantage of
using temperature is that it is not related to structural defects as closely as the tribological
conditions do [14]. In addition, lubrication debris has also been considered a reasonably
good indicator of bearing wear [15]. However, since it is generally time-consuming to
collect and analyze the debris, such a technique is not suited for on-line applications.
The major components of a condition-based monitoring system include the machinery,
condition-monitoring sensors, signal processors, fault classifiers, machine models, and the
monitoring output. Errors and uncertainties in fault classification can lead to false alarms,
which motivates research for better, more robust and reliable condition monitoring systems.
170
determined and the combination of the two sets was used to serve as the reference base for
models to test other segments of data. Statistical modeling methodology such as Hidden
Markov Model (HMM) [34] has been found to be well suited for the classification of
operating parameters and defects.
8.2. Condition Monitoring of Rolling Bearings
8.2.1 Significance
Rolling element bearings have been used in virtually every machine system. Many of their
applications are critically important and require that the machines be maintained at highly
reliable condition to avoid unexpected, premature machine breakdowns. Defects arise in
bearings during their usage because of adverse operating conditions, faulty installation, or
material fatigue. Adverse operating conditions may be caused by overloading, insufficient
or over-lubrication, or contamination in the rolling contact zone. At any point in time, only
a portion of the rolling elements is within the load zone, and high stress occur periodically
below the loaded surface. These stresses may cause microscopic cracks, which gradually
appear on the raceway surface after an extended period of use. Fragments of the raceway
then break away when rolling elements pass over these cracks, causing spalling or flaking
[36], which is a common mode of failure in bearings. The spall area increases with time and
can be identified by increased level of vibrations of the bearing. The debris generated in the
defect development process contaminates the lubricant, diminishes its effect, and causes
localized overloading [37].
Unexpected, premature bearing failure can be disastrous, especially if related to
transportation vehicles such as an airplane or a passenger train [3839]. It is desired to
enable on-line bearing condition monitoring so that no time lag would exist between the
data collection, diagnosis and maintenance actions. In a motor reliability study, it was
found that bearing problems accounted for over 40 % of all machine failures [40]. It has
also been found that a majority of bearings fail before they attain their service life, and only
about a third die from "old age" due to surface fatigue [41]. To investigate the real reasons
and find out better ways of preventing bearing failures have drawn considerable interest in
the research community and industry in recent years.
Every time when a rolling element hits a structural defect in the raceway, a series of
vibration pulses will be generated. Depending on the specific location of the defect (e.g. on
the inner or outer raceway, or on the rolling element itself), the family of the pulses will
contain characteristic frequencies specific to the bearing geometry and operation condition
(e.g. rotating speed). The highest pulse amplitude will be generated within the load zone of
the bearing. The difficulty in bearing fault detection stems from interference due to
structural vibrations generated by other parts of the machine system. A bearing diagnostic
tool needs to be designed robust enough to differentiate various vibration signals, in order
to effectively classify faults without generating false alarms. Understanding of the bearing
defect characteristics is critical to the proper design of bearing diagnostic tools.
8.2.2 Bearing Failure Modes
Due to the rotational nature of bearing operations, bearing failures are associated with
characteristic defect frequencies that are related to the speed of the bearing, the location
where the defect appears, and the bearing geometry [42]. Many of the defect frequencies
can be determined analytically, as shown in [43]. For example, if a point defect is located in
the outer race of the bearing, a frequency component of BPFO (ball pass frequency for
outer race) can be identified as:
BPFO = (1 - c o s ) Z
2
dm
171
(1)
where Ni is the rotational speed of the inner raceway in Hz, dm is the diameter of the pitch
circle of the rolling balls, D is the ball diameter, a is the contact angle and Z is the number
of balls in the bearing. Such characteristic frequencies play an important role in bearing
fault diagnosis and prognosis, especially when using spectral techniques. They also can be
used as input parameters for a diagnostic neural network [44].
5.2.3 Research Challenge
Research on bearing prognosis focuses on the prediction of a bearing's remaining life.
Prognosis is a logical step forward from fault diagnosis. However, it has been found that
reliably predicting the remaining service life of a bearing based on what has been diagnosed
can be highly challenging, due to the uncertainty involved. As the vibrations produced by a
surface defect in the bearing are periodic in nature, the defect characteristic frequencies are
often used in conjunction with other time-domain parameters (e.g. RMS or peak values) for
diagnosis and prognosis purposes. To ensure reliable analysis, the defect frequencies need
to be distinct and separable from the rest of the signals.
Bearing defects can be broadly classified as distributed and localized. The lack of
roundness and uneven ball diameter are examples of distributed defects, whereas spalls or
corrosion spots are typical localized defects. Difficulty in bearing diagnosis arises when
frequency components from multiple defects overlap in the spectrum, mixing up with the
harmonics and interference. In particular, the frequency spectrum of the vibration from a
bearing with multiple defects may appear similar to the spectrum from a bearing with a
single defect, causing signal "masking", as is illustrated in Figure 1, where S(f) is the
vibration amplitude,
is the angular separation of two inner raceway faults in degrees,
andfi is the inner raceway defect frequency [45]. Thus, designing a bearing diagnostic tool
that can learn from the signal variations due to fault "growth" presents a research challenge
as well as an opportunity for enhanced bearing condition monitoring.
172
173
A neural network estimation for flank wear has been demonstrated by [46], using a
recurrent neural network. Experiments were designed for five cutting speeds and five feeds,
at a constant depth of cut on a heavy-duty lathe. Three sensors were used to measure 1)
cutting, feed, and thrust forces, 2) vibrations along the main and feed directions, and 3)
acoustic emission of the tool. The network estimated the current flank wear using a time
lagged predicted value and six other inputs as shown in Figure 2. The measured signals
were transformed into the wavelet domain and three wavelet coefficients were used as part
of the input vector to the network. A fresh tool edge was used for cutting during each
experimental run. Signals were collected every minute and a microscope was used to
measure the flank wear. The network was trained using these observed values and was
tested using 150 patterns. The overall estimation error was below 0.0011 inch, which was
better than the pre-defined limit of 10 % of the total range. This study showed that a simple
and robust recurrent network architecture was capable of estimating continuous flank wear.
Besides, it illustrated its potential in the failure and degradation estimation of other
machining processes.
174
based on the use of values of a vibration parameter from the past time steps, which served
as input to the network (K-1,,, Kl-2...,K1-n,.),as shown in Figure 3(B). For predicting a discrete
number of time steps in the future, the time-lagged values of the predictions were used as
input parameters. Hence, the values for K1-1_, Kl-2...,K t - n , w e r epredicted values for the output
but time lagged by the appropriate number of unit delay parameter D. Investigations of
remaining life prognosis using a recurrent neural network have also been reported, where
the advantage of such a network over other prognosis schemes was illustrated [44].
175
neural network was found to be better suited than statistical analysis and genetic algorithms
for the drill wear monitoring task.
In another application of neural networks using multiple system inputs, the learning
abilities of a back propagation network for turning operations were studied [50]. The input
variables included feed rate, cutting depth and cutting speed and their effect on the output
variables (cutting force, power, temperature and surface finish) was studied. The network
was used to estimate the material removal rate subject to the operating conditions. A feed
forward network was used for the purpose, and it was shown that the network could
effectively learn with the desired level of accuracy. An "incremental" scheme, as illustrated
in Figure 4, was studied in which the network learned and synthesized simultaneously. For
the three inputs, corresponding output values measured by sensors are fed to train the
network. The weights of the network were then adjusted and the network was considered
partially trained. Subsequently, the system predicts an optimal input condition based on the
constraints or performance indices. This "incremental" learning was continued until the
predicted input has reached a level such that the error between the output recorded by the
sensors and the outputs of the neural networks was within the predetermined limits.
Figure 4: Incremental scheme applied in a neural network for tool condition monitoring
0, elsewhere
176
R.X.
where qmax represents the maximum load, n is dependent on the type of bearings involved,
y is the angle of contact, and e is the load distribution factor [43]. To simulate defect
growth in the bearing, holes of different sizes were drilled on the bearing races, with the
smallest hole being 0.34 mm in diameter. The experiments were conducted by measuring
vibration signals from the bearing and correlating the results of the spectral analysis to the
specific bearing speed (rpm) and loads, for each hole size (defect). To validate the
reproducibility of the data analysis, each data point was sampled three times. The bearing
speed was varied from 300 rpm to 900 rpm. The upper limit of the load applied to the
bearing was determined from the design specification sheet, which 300 psi when converted
to the setting on the hydraulic system.
8.4.1 Network Input Feature Construction
In order to reliably diagnose faults in a bearing, it is critical to select feature(s) that can
quantitatively describe the condition of the bearing vibrations, and use these features as
inputs to the diagnosing neural network. Since diagnosis essentially involves pattern
recognition, the goal of the neural network is to recognize the pattern of the relevant fault
features. Realistically, the presence of noise in the vibration spectrum and the fact that a
feature may represent a multitude of failure criteria complicates the problem. Furthermore,
the number of features to be used as the input to the neural network also affects the final
performance: too many input features will result in high computational load and slow
response, whereas too few features may not provide an accurate representation of the
defect. Ultimately, parameters that do not contribute to the diagnosis of faults should be
rejected.
An algorithm has been proposed in [33] to extract an optimal parameter (feature) set
from a candidate set. The two main criteria used for determining if a parameter should be
included in the set or not are the sensitivity and consistency of the parameter. The
sensitivity Stj of a parameter is used to evaluate its classificatory ability (or contribution to
the optimal set of features), and is defined as:
(3)
177
where xi, represents the parameter, yi the condition of the machinery being monitored (e.g.
the "health" of a bearing), and j refers to the signal pattern. Using the classificatory result,
the output set of a back propagation network was trained by a learning algorithm. Feature
selection was viewed as a special case of feature extraction, where the mapping between the
feature parameter x and the classificatory result y was considered to be a linear mapping.
For the study conducted in [51], all the parameters used for the bearing vibration
analysis were considered likely candidates for the feature set to a neural network. The
parameters considered in the time domain included average amplitude values, Root Mean
Square values, velocity, displacement, skew, kurtosis, and crest factor. The parameters
considered in the frequency domain included Ball Spin Frequency (BSF), Ball Pass
Frequency in Outer Race (BPFO), Ball Pass Frequency in Inner Race (BPFI), and the
energy dissipated in the bearing, which is given by the area under the spectral curves. The
reason to consider all these parameters as likely candidates for the feature sets was to
identify the best suited parameters in a systematic and comprehensive way.
In another study, four vibration signals were identified and considered for the input
feature construction of a neural network [52]. These include: 1) vibration due to outerraceway defects with the frequency fBPFO, 2) vibration due to inner raceway defects of
frequency fBPFI, 3) vibration due to ball rotation along the raceway, with the basic frequency
of fBPFO' and 4) vibration due to misalignment and/or unbalance with the frequencies of 2fr
and fr, respectively (with fr being the shaft speed). To enhance the feature extraction ability
of the system for incipient defects, a combined wavelet and Fourier analysis were used to
extract the features of defect vibrations. The four features constructed for the neural
network included:
x1: RMS of the first four harmonic peaks of the outer-raceway defect vibration, extracted
from a combined wavelet-Fourier analysis:
_ V
(4)
x2: RMS of the first four harmonic peaks of the inner-raceway defect signal, extracted
from a combined wavelet-Fourier analysis:
(5)
x3 : RMS of the two peaks of unbalance vibration (F(f u )) and misalignment vibration
(F(f m ) in the spectrum (F(f)) of Fourier analysis:
178
(9)
10
20
30
40
50
60
70
80
Using this approach the neural network was trained with initial conditions, and the
output was the defect severity given by:
where 5, is the defect diameter and Aop is the critical defect diameter. When Sf = AOP, the
defect severity d could be calculated to be 0.63, which was used as the alarm threshold. For
a value of d < 0.63, the operation of the bearing was classified as "safe". A danger
threshold was defined for 5i = 2AOP , which gives a defect severity of 0.86. The bearing
condition was classified as "danger" for values of d between 0.63 and 0.86. If the value is
greater than this, the bearing is said to have "failed". For multiple defects, the individual
indices were multiplied and the overall defect severity of the bearing was quantified by:
(11)
where d,,d2,...,dt are the individual defect severity pertaining to each defect. The health
index of the bearing was then defined as:
h = l-d
(12)
180
one to seven. This resulted in 28,665 combinations to be analyzed. Out of the total number
of combinations, the objective was to determine the best combination, which gave the least
error. The network was trained using the back-propagation algorithm. The number of
epochs was set to 2000 because no changes in the mean squared error were seen after this
value.
The study concluded that the parameter "energy" was present in all of the "best"
performing networks. The 'best' networks were chosen with respect to the least mean
squared error for the overall network, as shown in Table 1. The entries in the first column
represent the number of input nodes followed by the number of hidden and output nodes.
The error was seen to be the least for 40 hidden nodes. The error increased irrespective of
whether the number of nodes are increased or decreased from this value.
Table 1: Best performing networks
Nodes
5-30-1 Energy
5-35-1 Energy
5-40-1 Energy
BPFO
BPFO
BPFO
Parameters
BPFI BPF02
BPFI BPFO3
BPFI BPFO4
2.2
2.2
2.1
Energy was found to be a feature in over 90 % of the top 500 combinations, followed by
the BPFI and BPFO as the second most important factors. In addition, the first harmonics
for the ball passing frequency for both inner (BPFI2) and outer raceways (BPFO2), and
kurtosis were found to play a major role. The best network without the energy parameter
consisted of crest factor, BPFO and BPFO2. It had an error of 2.5x10-3, which was 14%
higher than the error for the best network. The occurrence of the BPFO factor can be
explained by the fact that the defect was on the outer raceway initially. Examining the
occurrence of parameters in the top 100 combinations, it was found that RMS value, RPM,
load and crest factor were not very effective in identifying bearing defects (Table 2).
Table 2: Parameter occurrence in top 100 combinations
Parameter
Occurrence ( %)
Energy
BPFO
BPFI
BPFI2
Kurtosis
BPFO2
Max. Speed
Max. Displacement
Crest factor
RPM
RMS value
Load
99
84
81
57
56
55
55
54
2
0
0
0
The occurrence of crest factor appeared to be random. Based on the occurrence of these
parameters in the best combinations, a revised combination could be chosen so that the total
numbers of parameters used would be much less and only the relevant parameters are
emphasized for better quality results.
181
No single neural network based on one unique set of operating parameters was found to
be completely successful in diagnosing faults in the test bearing under all operating
conditions. To solve this problem, the entire operating spectrum was divided into sixteen
regions, each of which containing a specific combination of load and speed values at which
experiments were conducted. Subsequently, sixteen different neural networks were
designed and applied to these regions. This approach enables a more adaptive, conditionspecific solution to be provided to the system being monitored. The division of the adaptive
areas and the use of separate neural networks to form a layered analysis structure are
illustrated in Figure 7.
The division of the operation spectrum into sixteen regions is shown in Figure 8.
Analysis was made for each region, using all the parameters available as described in the
previous section. Then the nine most often occurring input parameters were analyzed for
the least error. The gray shaded pattern in Figure 8 denotes the importance of the
respective parameters, with dark gray areas illustrating the "best" solution provided by the
parameter for the specified region, and the light gray areas showing their performance as
the "second best" solution.
The pattern revealed the relative importance of various parameters under different
operating conditions. For example, both BPFI and BPFO appear to be essential parameters
for high bearing speeds. This can be explained by the high energy content of the signals
involved at high speeds that makes these two parameters distinctive. The pattern also
confirmed previous analysis using one neural network that energy is a critical parameter for
most of the operation conditions. The crest factor and kurtosis have shown effective
coverage for relatively low operation speeds, and the speed and displacement appeared to
be good indicators under higher load conditions.
To better understand the importance trend, research is being conducted to evaluate the
combined effect of multiple parameters simultaneously for each region. Furthermore, a
"cluster" analysis will be performed on larger regions consisting of the present individual
regions.
182
In the work reported in [51], two separate neural networks were built to evaluate two
different types of defects on the inner and outer raceways. The architecture of the network
was determined based on experimentation using various combinations of hidden layer
nodes. Two sets of input features {x 1 , x2 x3p x4] were obtained for the defects, using the
Eqs. (4) - (7). The defect severity output information for these two networks was
multiplied to give the overall health of the bearing as shown in Figure 9.
A total of 960 feature vectors were constructed from the outer and inner raceway
analysis. Three bearings with a point defect in the form of a 0.25 mm and 3 mm hole in the
inner or outer raceway or a combination of both were tested. Two thirds of the feature
vectors were used as training data and the rest for checking purpose. For the inner raceway
defect evaluation, the error converged to 0.013 after 2,267 epochs (Figure 10). To test the
performance of the network, the checking data was used to classify the input data from the
bearing faults under different load, speed and temperature conditions. It was observed that
the error in the defect severity of the network was within 0.1 (Figure 11). With an error
limit of 0.15/2 for classification, it was concluded that the network has achieved a success
rate of 99%.
1.0E+3-
1.0E+2-
tu,
8
5
1.0E+1-
O)
1.0E+0=0.013
1.0E-1-
1.0E-20
500
2500
3000
3500
4000
Epoch number
Figure 10: Learning error curve for inner raceway defect
(13)
184
1.0
0.90.80.7-
o.e0.50.40.30.20.10.0
360
a - no defect, d1 = 0
where a is the crack length, N is the number of cycles, C0 is a material constant and AK is
the stress intensity factor. Assuming that the stress constant remains unchanged for the life
of the bearing, the constants C0 and n can be determined based on the experimental data
points, and subsequently, the formula can be used to predict the crack growth. Based on the
growth rate found, a neural network was used to determine the time constant for the
prognosis model by analyzing how the size of a defect has grown in time.
The remaining life of a bearing is defined as the number of cycles or hours at which the
bearing runs under a certain combination of speed and load, before failure is initiated. In
reality, the remaining life of a bearing depends is influenced by other conditions such as the
assembly, temperature, quality of lubrication, etc. To account for the various scenarios,
different bearing life models have been proposed [47] that describe the relationship
between the bearing condition and these parameters as a linear, exponential, or polynomial
functions. The four curves in Figure 12 represent such functions, with y/(t) representing
the rate of bearing deterioration with respect to time t.
185
The input vector to a neural network X = [x1,, x2, ,...,xn] may contain measurement data
from various physical sensors, e.g. load, temperature, displacement, or acoustic emission.
In the vector, n is the number of variables, which is equal to the number of input neurons in
the network. To avoid large pivots in the neural network calculations, the constituent
elements of the input vector can be normalized such that xi e [0,l]. If a back-propagation
neural network with one hidden layer is used, the output function of the neural network can
be written as:
where 0,., is the predicted output function at time t for the variable xi and
Here, the weights of the interconnections between the input layer and the hidden layer
(w n j ) and those between the hidden layer and the output layer (w' n ) are fixed, before the
neural network is trained using the experimental data
Once the
training is completed, the neural network is subjected to data gathered from the time steps
(t-1) to (t-p), in order to obtain the failure trend for the particular bearing being monitored.
Given that the crack propagation may not be fully described by a idealized crack
geometry and size, and realistic crack parameters would only be available when measured
on a disassembled bearing, it was suggested that an adaptive prognostic scheme be used
[59-60]. The resulting prediction was then compared with the actual condition of the
bearings being monitored, and recursive iteration was implemented to improve the model
performance. Through time-domain integration, the defect size can be expressed as:
lnD = a+(t+t 0 )
(16)
where t0 is the time when the smallest defect area (D0) occurs, a and are constants
depending on the material. These parameters were first estimated, given the fact that they
may vary with the progression of damage. A recursive least squares algorithm was used to
update their values, based on the vibration and acoustic emission measurements conducted
on a defective bearing. The resulting defect propagation model was then coupled with the
defect diagnostic model to adaptively predict the remaining life of the bearing.
8.5. Conclusions
Extensive research over the past decade has turned neural networks into an indispensable
tool for solving a wide range of problems in both scientific labs and on the factory floor. In
the specific areas of machine condition monitoring, fault diagnosis, and remaining service
life prognosis, neural networks will play an increasingly important role, and its ability will
be continually enhanced through other innovative and complimentary technologies.
Research is continuing in the author's group, with the ultimate goal to develop effective and
efficient bearing condition monitoring and diagnostic techniques that can be applied to
solving real-world problems.
186
Acknowledgment
Research described in this paper was sponsored by the US National Science Foundation under CAREER
award #DMI-9624353. Support from the SKF corporation is appreciated. The author is grateful to the
valuable contribution and assistance from his former and present graduate students Dr. C. Wang, Dr. B.
Holm-Hansen, M. Kaczorowski, and A. Malhi.
References
[I]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
G. Byrne, D. Dornfeld, I. Inasaki, G. Ketteler, W. Konig and R. Teti, 'Tool conditioning monitoringThe status of research and industrial application", Annals of the CIRP, Vol. 44, No. 2, pp. 541567,
1995.
K. Ng, "Overview of machine diagnostics and prognostics". Symposium on Quantitative
Nondestructive Evaluation, ASMEIMECE Conference, Dallas TX, November, 1997.
P. Keller, R. Kouzes and L. Kangas, "Neural network based sensor systems for manufacturing
applications", Advanced Information Systems and Technology Conference, Williamsburg, VA, PNLSA-23252,2830 March, 1994.
S. Billington, Y. Li, T. Kurfress, S. Liang and S. Danyluk "Roller bearing defect detection with
multiple sensors", Proceedings of the 1997 ASME International Mechanical Engineering Congress
and Exposition Tribology Division, Vol. 7, pp. 3136, 1997.
P. Tse and D. Wang, "A hybrid neural network based machine condition forecaster and classifier by
using multiple vibration parameters", IEEE International Congress on Neural Networks, Vol. 4, pp.
20962100,19%.
J. Kline and J. Bilodeau, "Acoustic wayside identification of freight car roller bearing defects", Proc.
of ASME/IEEE Joint Railroad Conference, Vol. 6, pp. 7981,1998.
S. Braun and B. Datner, "Analysis of roller/ball bearing vibrations", ASME Journal of Mechanical
Design, Vol. 101, pp. 118124, 1979.
D. Dyer and R. Stewart, "Detection of rolling element bearing damage by statistical vibration
analysis", ASME Journal of Mechanical Design, Vol. 100, pp. 229235,1978.
J. Breggren, "Diagnosing faults in rolling element bearings: Part 1", Vibrations, Vol.4, No. 1, pp. 513,
1988.
T. Igarashi and S. Yabe, "Studies on the vibration and sound of defective rolling bearings". Bulletin of
JSME, Vol. 26, No. 220, pp. 17911798, 1983.
N. Tandon and B. Nakra, "Defect detection in rolling element bearings by acoustic emission method",
Journal of Acoustic Emission, Vol. 9, No. 1, pp. 25-28, 1990.
C. Tan, "Application of acoustic emission to the detection of bearing failures", Proc. of the Engineers
of Australia Tribology Conference, Brisbane, pp. 110114, Dec. 3-5 1990.
K. Mori, N. Kasashima, T. Yoshioka and Y. Ueno, "Prediction of spading on a ball bearing by
applying the discrete wavelet transform to vibration signals". Wear, Vol. 195. No. 1-2, pp. 162-168,
1996.
A. Gibson and L. Stein, "Reduced order finite element modeling of thermally induced bearing loads in
machine tool spindles", Proc. of ASME, DSC Vol. 67, pp. 845-852, 1999.
K. Goddard and B. Maclsaac, "Use of oil borne debris as a failure criterion for rolling element
bearings". Lubrication Engineering, Vol. 51, No. 6, pp. 481-487, 1995.
T. Moriwaki, Presentation at working group meeting, Proc. of First Workshop on Tool Condition
Monitoring-CIRP, Paris, January 1993.
B. Holm-Hansen and R. Gao, "Vibration analysis of a sensor-integrated ball bearing", ASME Journal
of Vibration and Acoustics, Vol. 122, pp. 384392,2000.
R. Gao and P. Phalakshan, "Design consideration for a sensor integrated roller bearing", Proc. ASME
International Mechanical Engineering Conference and Exposition, Symposium on Rail Transportation,
RTD-Vol. 10, pp.8186, 1995.
B. Holm-Hansen and R. Gao, "Smart bearing utilizing embedded sensors: design considerations",
Proc. SPIE 4th International Symposium on Smart Structures and Materials, Paper No. 304151, San
Diego, CA, pp. 602610, 1997.
C. Wang and R. Gao, "Sensor module for integrated bearing condition monitoring", Proc. ASME Dynamics Systems and Control Division, Vol. 67, pp. 721728, 1999.
N. Tandon, "A comparison of some vibration parameters for the condition monitoring of rolling
element bearings", Journal of the International Measurement Confederation, Vol. 12, No. 3, pp. 285289, 1994.
R. Heng and M. Nor, "Statistical analysis of sound and vibration signals for monitoring rolling element
bearing condition", Applied Acoustics, Vol. 53, No. 1-3, pp. 211226, 1998.
[23] W. Staszewski and G. Tomlinson, "Application of the moving window procedure in spur gear",
COMEDEM-93, Bristol, England, July 2123, 1993.
[24] R. Randall, "Cepstrum analysis and gearbox fault diagnosis", Bruel and Kjaer Application Note, pp.
233-280, 1982.
[25] P. McFadden and W. Wang, "Time frequency domain analysis of vibration signals for machinery
diagnostics: introduction to Wigner-Ville distribution", Technical Report, Department of Engineering
Science, Oxford University, Report No. OUEL 1859/90, 1990.
[26] P. McFadden, "Application of wavelet transform to early detection of gear failure by vibration
analysis", Proc. International Conference of Condition Monitoring, University College of Swansea,
Wales, 1994.
[27] I. Alguindigue, A. Loskiewicz-Buczak and R. Uhrig, "Monitoring and diagnosis of rolling element
bearings using artificial neural networks", IEEE Transactions on Industrial Electronics, April, Vol. 40,
No. 2, pp. 209217, 1993.
[28] B. Paya, M. Badi and I. Esat, "Artificial neural network based fault diagnostics of rotating machinery
using wavelet transforms as a preprocessor", Mechanical Systems and Signal Processing, Vol. 11 (5),
pp.751765, 1997.
[29] D. Baillie and J. Mathew, "A comparison of autoregressive modeling techniques for fault diagnosis of
rolling element bearings", Mechanical Systems and Signal Processing, Vol. 10, pp. 1-17, 1996.
[30] J. Shiroishi, Y. Li, S. Liang, T. Kurfess and S. Danyluk, "Bearing condition diagnostics via vibration
and acoustics emission measurements", Mechanical Systems and Signal Processing, 11(5), pp. 693
705, 1997.
[31] G. Krell, A. Herzog and B. Michaelis, "An artificial neural network for real time image restoration",
Proc. IEEE Instrumentation and Measurement Technology Conference IMTC'96, Brussels, Belgium,
pp. 833-838,1996.
[32] K. Van Laerhoven, K.-Aidoo and S. Lowette, "Real-time analysis of data from many sensors with
neural networks", Proc. of the 4th International Symposium on Wearable Computers, ISWC, Zurich,
Switzerland, IEEE Press, 2001.
[33] Y. Shao, K. Nezu, K. Chen and X. Pu, "Feature extraction of machinery diagnosis using neural
networks", IEEE International Congress on Neural Networks, Vol. 1, pp. 459464, 1995.
[34] C. Bunks and D. McCarthy, "Conditon-based maintenance of machines using hidden markov models",
Mechanical Systems and Signal Processing, Vol. 14(4), pp. 597-612, 2000.
[35] T. Tallian, "A data fitted bearing life prediction model", Tribology Transactions, Volume 39, pp. 249258,1996.
[36] P. Eschmann, L. Hasbargen and K. Weigand, "Ball and roller bearings: their theory, design and
application", K. G. Heyden and Co. Ltd., London, 1958.
[37] M. Hartnett, "Analysis of contact stresses in rolling element bearings", ASME Journal of Lubrication
Technology, Vol. 101, No. 1, pp. 105-109, 1979.
[38] A. Duquette, "FAA orders inspections of GE90 engines installed on Boeing 777 aircraft", FAA News,
APA 6397, 1997.
[39] A. Duquette, "FAA/Industry to improve engine inspections", FAA Press release, APA 6397, 1997.
[40] R. Schoen, T. Habetler, F. Kamran and R. Bartheld, "Motor bearing damage detection using stator
current monitoring", IEEE Transactions on Industry Applications, Vol. 31, No. 6, pp. 1274-1279,
1995.
[41] J. Berry, "How to track rolling element bearing health with vibration signature analysis", Sound and
Vibration, Vol. 25, No. 11, pp. 2435, 1991.
[42] A. Barkov and N. Barkova, "Condition assessment and life prediction of rolling element bearings",
Sound and Vibration, www.vibrotek.com/articles/sv95/partl/index.htm. June and September, 1995.
[43] T. Harris, "Rolling bearing analysis", 3rd. Ed., Wiley, New York, 1991.
[44] P. Tse and D. Atherton, "Prediction of machine deterioration using vibration based fault trends and
recurrent neural networks", Journal of Vibration and Acoustics, Vol. 121, pp. 355-362, 1999.
[45] B. Holm-Hansen, "Development of a self-diagnostic rolling element bearing", PhD Dissertation,
University of Massachusetts, Amherst, MA, September, 1999.
[46] S. Bukkapatnam, S. Kumara and A. Lakhtakia, "Fractal estimation of flank wear in turning", ASME
Journal of Dynamic Systems, Measurement and Control, Vol. 122, pp. 89-94, 2000.
[47] Y. Shao and K. Nezu, "Prognosis of remaining bearing life using neural networks", Proceedings of
Institution of Mechanical Engineer - Journal of Systems and Control Engineering, Vol. 214 (3),
pp.217230, 2000.
[48] A. Sokolowski, M. Rehse and D. Dornfeld, "Feature selection in tool wear monitoring using fuzzy
logic and genetic algorithms", LMA Research Reports, University of California at Berkeley, pp. 91-97,
1993.
[49] M. Rehse, "In process tool wear monitoring of multi spindle drilling using multi sensor system",
Diplomarbeit, LMA/University of California at Berkeley and WZL/RWTH Aachen, 1993.
188
[50] S. Rangwala and D. Dornfeld, ""Learning and optimization of machining operations using computing
abilities of neural networks", IEEE Transactions Systems, Man and Cybernetics, Vol. 19, No. 2, pp.
299-314, 1989.
[51] M. Kaczorowski, "A neural network approach for ball bearing life prognosis". Project Report,
Mechanical and Industrial Engineering Department, University of Massachusetts, May, 2001.
[52] C. Wang, "Embedded Sensing for Online Bearing Condition Monitoring and Diagnosis", PhD
Dissertation, University of Massachusetts, Amherst, MA, May, 2001.
[53] S. Zhang, R. Ganesan, and G. Xistris, "Self-organizing neural networks for automated machinery
monitoring systems", Mechanical Systems and Signal Processing, Vol. 10(5), pp. 517532, 1996.
[54] N. Roehl, C. Pedreira, and H. Teles de Azevedo, "Fuzzy ART neural network approach for incipient
fault detection and isolation in rotating machines", IEEE International Conference on Neural
Networks, Vol. 1, pp. 538-542, 1995.
[55] G. Betta and A. Pietrosanto, "Instrument fault detection and isolation: State of the art and new research
trends", IEEE Transactions on Instrumentation and Measurement, Vol. 49, No. 1, pp. 100106,2000.
[56] C. Rodriguez, S. Rementeria, J. Martin, A. Lafuente, J. Muguerza and J. Perez, "A modular neural
network approach to fault diagnosis", IEEE Transactions on Neural Networks, Vol. 7, No. 2, pp. 326
339,1996.
[57] Z. Chen and J. Maun, "An artificial neural network based real-time fault locator for transmission
lines", Proc. IEEE International Conference on Neural Networks, Vol. 1, pp. 63-68, 1997.
[58] M. Hoeprich, "Rolling element bearing fatigue damage propagation", ASME Journal of Tribology,
Vol. 114, pp. 328333, 1992.
[59] Y. Li, S. Billington, C. Zhang, T. Kurfress, S. Danyluk and S. Liang, "Adaptive prognostics for rolling
element bearing condition", Mechanical Systems and Signal Processing, Vol. 13(1), pp. 103-113,
1999.
[60] Y. Li, S. Billington, C. Zhang, T. Kurfress, S. Danyluk and S. Liang, "Dynamic prognostic prediction
of defect propagation on rolling element bearings", Tribology Transactions, Volume 42, pp. 385-392,
1999.
Chapter 9
Neural Networks
for Measurement and Instrumentation
in Robotics
Mel SIEGEL
The Robotics Institute, School of Computer Science, Carnegie Mellon University
Pittsburgh, PA 15213, USA
Abstract. The chapter begins with a historical review of the parallel
conceptualization of neural networks and intelligent machines. Neural networks
were actually created as brain models for the perceptual, cognitive, and actuation
systems of future robots. We then develop the title topic, largely via examples.
First we present a case that illustrates the architectural issues that we developed
abstractly in the introduction.
The particular application involves image
understanding for robot navigation on a surface. Next we present a broad sample of
the intersection between neural networks and robotics via thirteen distinct projects,
each briefly summarized, illustrated, and referenced; each ends with a summary of
issues, problems, and techniques that are raised or clarified by the example. In the
last part we present detailed examinations of two cases. The first case involves
image understanding for detection and characterization of aircraft surface flaws. The
second case involves interpretation and fusion of signals from solids state chemical
sensors. The flaw detection case effectively illustrates a situation wherein either
neural network or fuzzy logic technology is potentially applicable, but in practice
one or the other works better for specific types of flaws; we speculate that the
difference is related to the contrasting nature of the cognitive skills required to
accomplish the two tasks.
The chemical sensor case similarly contrasts
classification and quantitation applications of neural networks; both capabilities are
required for different aspects of the practical problem. Both general and casespecific literatures are briefly reviewed at the end of the introduction; a complete list
of references, including URL pointers to on-line papers or abstracts, is provided at
the end of the chapter.
9.1. Instrumentation and measurement systems for robotics: issues, problems, and
techniques
9.1.1 Historical review
The first connection between neural networks and robotics can be dated to the first
discussion of what we would today call neural networks. Rosenblatt, in the seminal
"Principles of Neurodynamics" [1], stated that the aim of his work on "perceptrons" was to
build mathematical models of how brain-like systems might be organized based on
available biological evidence. His perspective corresponded exactly with the modern
"sense, think, act" - to which I would add "communicate" - paradigm for robotic systems,
with a brain model intimately based on inputs from sensors, outputs to actuators, and active
learning about the environment. This early close connection to robotics was supplanted, in
190
the 1980s, by a more abstract focus on systems that "learn" arbitrary transfer functions by
iterative fitting of general function parameters.
So, despite later backsliding toward abstraction, the earliest efforts - from the mid1950s through the mid-1960s - were firmly grounded in an explicit robotic model. A
hardware implementation of the "perceptron" architecture had a small (~16x16) photocell
array simulating a binary retina, neural network nodes realized by relays that responded to
the sum of several input currents, inter-node weights realized by motor-driven rheostats,
and an iterative training regimen that included positive and negative feedback for "reward"
and "punishment". These hardware implementations could learn, e.g., block letters with
arbitrary x-y translations on the retina, but generally not rotations, partial letters, or other
transformations and degradations that we would today regard as essential tests of an ability
to generalize and abstract. Figure 1 illustrates the "Mark I Perceptron".
Early neural network terminology was actually much closer to modern robotic
terminology than is modem neural network terminology. The bland abstract modern terms
"input units", "hidden units", and "output units" were originally "sensory units" (or "Sunits"), "associative units" (or "A-units"), and "response units" (or "R-units"), terms that
require practically no explanation to modern practitioners of robotics. The early model is
illustrated in Figure 2.
UTINA OF
S-WITS
-"'
M1WT SIMM.
+1 M -I
OBSERVED
OUTPUT
MOVEMENT,
ACCOMOOATIOK, ETC.
NERVOUS SYSTEM
EXTERNAL
ENVIRONMENT
(VISUAL. AUDITORY,
TACTILE. OLFACTORY
INFORMATION)
Figure 3: (top) Model of "perceptron" in an adaptive control loop, (bottom) its equivalent in terms
of corresponding biological structures and functions.
192
193
binary output, say +1 if its input signal exceeds a threshold, and 0 otherwise. Although
somewhat restrictive from the perspective of the sensing and measurement scientist, i.e.,
sensors that respond fundamentally to environmental parameters other than deposited
energy are well known, allowing the response to be "some function of the input energy"
surely encompasses all the actual possibilities. In any case, the firm grounding of S-units,
or in modern usage input units, to the transduction of information, carried by energy,
between the environment and the measurement system, is refreshingly concrete in
comparison with modern abstract formulations of neural network fundamentals.
Association-units, or A-units, are "signal generating units, typically logical decision
elements, having input and output connections". These clearly correspond exactly to the
"hidden units" of modern neural network theory. A simple A-unit is defined as a logical
decision element that generates a binary output signal, say +1 if the algebraic sum of its
input signals exceeds a threshold and 0 otherwise. The term active is employed to
designate an A-unit that is activated, i.e., whose output state is +1.
Response-units, or R-units, are defined as "signal generating units having input
connections, and emitting signals that are transmitted outside the network, i.e., to the
environment or external system". These clearly correspond exactly to the "output units" of
modern neural network theory. A simple R-unit is defined as an R-unit with that generates
a binary output signal, say +1 if the algebraic sum of its input signals is strictly positive, -1
if the algebraic sum of its input signals is strictly negative, and either zero or indeterminate
or perhaps oscillatory if the algebraic sum of its input signals is zero.
With these definitions - and equivalent mathematical notation - Rosenblatt went on to
define perceptron and the simple perceptron in a way that most current readers will
recognize as corresponding to the basic modern definition of a neural network. The
perceptron is defined as "a network of S-, A-, and R-units with a variable interaction matrix
V (formally defined previously) that depends on the sequence of past activity states of the
network". Concretely, the definition of a simple perceptron say it is a perceptron in which
"(i) there is only one R-unit, with a connection from every A-unit; (ii) connections are only
from S-units of A-units and from A-units of S-units; (iii) the weights of the S-unit to A-unit
connections do not change with time; (iv) transmission time between units is constant; (v)
the output signal of any unit is a function of the algebraic sum of its input signals". This
definition is easily seen to correspond very closely to the modern definition of a three layer
fully connected neural network with one output node and no interlayer coupling; the main
difference is that the general modem neural network definition would not manifestly
require the S-to-A coupling to be fixed in time. Note, however, that "time" in this context
refers to running time, not training epoch - indeed, training has not yet been mentioned and in modern typical practice no weights would be changed during runtime unless there
were an explicit hybrid run-train strategy.
Reinforcement, what we would now call learning, enters via the definition of the
reinforcement system, "a set of rules by which the interaction matrix (or memory state) of a
perceptron may be altered through time", and the reinforcement control system, a
"mechanism external to a perceptron that is capable of altering the interaction matrix of the
perceptron in accordance with the rules of a specified reinforcement system". Implicit in
the latter definition, of course, is that the strategy is something sensible, like "increase
weights that contribute to correct response and decrease (or increase negatively) weights
that contribute to incorrect response.
Finally, Rosenblatt defines an experimental system: "a system consisting of a
perceptron, a stimulus world, and a reinforcement control system; the reinforcement control
system may be an automatic regulating device, e.g., a thermostat, or a human operator,
capable of responding to the responses of the perceptron and the stimuli in the environment
by applying the appropriate reinforcement rules, altering the memory state of the
perceptron". In other current terminology, this is a robot.
194
By the early 1960s, theorems had been stated and proved to the effect that solutions,
i.e., weight matrices, exist that map specified kinds of input spaces into specified kinds of
output spaces, and that these weight matrices can be found in finite time by iterative
procedures, i.e. "training". However no practical implementation strategy was found until,
in the mid-1980s, the rediscovery of the "back propagation (of error)" algorithm made
actual implementation practical.
9.1.3 Example of a neural net application in robotics: how to make a machine vision
system see lines of rivets on an aircraft skin
In the following section we will review a variety of applications so as to convey something
of the scope of neural network technology in robotics. For the sake of completeness before
ending this introductory section, we will briefly survey one of the applications that we will
cover in increasing detail in section 2 and 3.
The problem relates to navigation of a mobile robot that traverses an aircraft's skin
(using suction cups to adhere to the sides and belly) looking for cracks, corrosion,
mechanical damage, and other flaws [19]. The part of this problem of interest now is not
the inspection sensing technology per se, but rather the proprioceptive ("self-awareness")
technology that the robot needs in order to traverse the skin surface in a systematic and
knowledgeable way. Aircraft features are normally described in an embedded coordinate
system based on enumerating the circumferential and longitudinal "lines of rivets" that
attach the skin to the airframe skeleton. The problem for us is that "lines of rivets" are an
abstraction constructed by the human eye-brain system: in reality there are no lines, there
are only rivets that our eye-brain abstracts into lines. Our machine vision task is then to
develop an algorithm that can reliably find "lines of rivets" in video imagery of aircraft
skin. The problem is difficult in the vision technology domain as well as in the
computational algorithm domain: the contrast of metal rivets in a metal skin is very low,
and the difficulty is exacerbated by specular reflections. Figure 5 shows an exemplary low
resolution, low contrast image with five rivets "in a line", plus one rivet below the line.
Our approach is to train a neural network operator to recognize the "rivetness" quality
of a square window on the image. Training is under human supervision. The output of the
neural network "rivetness" operator is illustrated in Figure 6. This is a 15x15 pixel input
network whose output is a measure of the similarity of the pixels under the current window
to the rivet-containing windows in the training set. Contrast of metallic rivets against
metallic skin is greatly enhanced by this operator.
Figures 7 and 8 illustrate the remaining steps. The latter shows the result of applying a
conventional edge-finding algorithm to the image in the former; contrast is further
enhanced, i.e., the fuzziness of the "rivetness" image is removed. The combined result of
the last two steps is shown in Figure 8. First, region filling and binarization completely and
accurately isolate the rivets. Then a robust line-fitting algorithm draws the "line of rivets".
The robust algorithm is needed to reject the outlier rivet below the line: if it were not
rejected, the line obtained would not make human sense, nor would it be practically useful.
In summary, in a typical application the neural net is one step in a pipeline of
algorithms; each step is typically simple and more-or-less standard; the "magic" is in the
choice and order of the steps. In this example, we began with noisy, low resolution, low
contrast image data, and employed steps that:
- accented features of interest for the navigation problem, i.e., "rivetness", using a neural
network operator;
- sharpened features of interest using a standard edge-finding operator;
- enhanced features of interest using standard region-filling and thresholding;
- applied a "robust line fitting" algorithm to find the rivet line in the image, free of
perturbation by outliers.
196
in [1], and on the other hand, by its pessimism, for a decade nearly closed off research in
field. The other general references, some to purely web-based resources, may or may not
appeal to particular readers, depending on their individual preferences and perspectives.
9.1.5 Summary
To summarize:
- Neural network technology grew out of early thinking about perceptive active
"creatures" that can learn about their environments.
- The resulting paradigm was later recognized to be a useful practical technology for
creating functions that fit, interpolate, and perhaps extrapolate data ("generalization")
- A neural net that is one link in a control loop is often the intellectual/instrumentation
signal-to-understanding translation element
Figure 9 pictorially represents this summary in the form of an architecture for a closed-loop
system with sensing, understanding, and acting elements.
effector
system
(actuators)
controller
(another
neural net?)
data
measurement
information
Figure 9: Architectural model for neural network for measurement and control in robotic
applications. The bracketed portion labeled PERCEPTRON is a neural network. Its Sunit (sensory, input) layer receives stimuli from the ENVIRONMENT, and can influence
the environment via its R-unit (response, output) activity. The REINFORCEMENT
CONTROL SYSTEM monitors the ENVIRONMENT and the RESPONSE, and corrects
(V CONTROL) the connection weights to bring the actual response into closer accord
with the desired response. To close the control loop and achieve some practical action of,
e.g., a robot, the R-unit output is interpreted in the block labeled
DATA/MEASUREMENT/INFORMATION. This block's output provides data need by a
MOTION CONTROL SYSTEM (which may itself be a neural network, e.g., model of
robot system dynamics) to effect useful action in the ENVIRONMENT. The MOTION
CONTROL SYSTEM output provides power and information to the EFFECTOR
SYSTEM (actuators, robots). The control loop is closed when the EFFECTOR system's
output modifies the environment.
197
What are the consequences of the fact that, typically, the neural network has many more
degrees of freedom (parameters) than the physical system it models?
How does the neural network's "holographic" memory aid (or detract from) system
robustness?
Once the net is trained, should you expect to be able to look inside it and understand
why, in terms of the physical system it models, it works the way it does? If yes, then
does not the true utility of the neural network approach lie in the training phase, i.e., the
discovery of the input-output function of the physical system, after which, in the
running phase, the neural network can be discarded in favor of a concise algebraic and
Boolean restatement of what the network has learned?
9.2. Neural network techniques for instrumentation, measurement systems, and
robotic applications: theory, design, and practical issues
In this section we briefly review a wide-ranging sample of neural network applications
within the broad context of robotics. The specific applications, drawn primarily from the
activities at my home institution, are:
- Road Driving Vehicle Controller [20]
- Off-Road Driving Controller [21]
- Hand Tremor and Error Correction [22]
Drowsy Driver Detection [23]
- Robotic Inspection of Aircraft Skin [24]
- Estimation of Stability Regions [25]
- Robot Models for Motion Planning [26]
Numerical Solutions [27]
Learning Human Control Strategies [28]
- Detecting Pedestrians in City Traffic [29]
Chinese Character Recognition [30]
Face Recognition [31]
- Gesture-Based Communication [32]
The aim of this selection is to give the reader the flavor of neural network application in
robotics via many brief summaries that in the aggregate span great breadth. In contrast,
section 3 will examine two applications - both from the author's laboratory - in detail.
9.2.1 Road driving vehicle controller
This application [20] involves a neural network implementation of a vehicle controller for
driving on roads and highways. As illustrated in Figure 10, it uses a three layer multioutput perceptron examining the video stream from a camera; each cameral pixel
corresponds to an input (or S-) unit. There are five hidden (or A-) units, and 32 output (or
R-) units corresponding to 32 potential steering directions. Learning employs an
"evolutionary" approach vs. the back-propagation algorithm; this approach results in higher
training cost, but the resulting performance is empirically better. Application domain
specific error metrics are developed and employed to increase the effectiveness of the
training process. Apropos of the last question posed at the end of section 1, the paper looks
into the trained network in an attempt to understand what is actually being learned.
Issues, problems, and techniques of this application are summarized as follows:
- The task is a relatively simple one (to drive in lane), but the environment is extremely
complex (road or highway with traffic, distractions, imperfect lane markings, etc.); in
this sort of scenario, what is actually being learned, and what is its relevance to the
application task?
198
The controller response to observed heading error is essentially linear; then why not use
a neural net sensor but a conventional controller?
Steering direction is determined by computing the centroid of -30 analog outputs; what
is the advantage of this approach over an output with a single bipolar analog output
proportional to the desired steering angle?
Figure 10: ALVINN: (left) the vehicle, (right) image input to neural network
whose outputs indicate the steering direction needed to follow the road.
Figure 11: MAMMOTH modular neural network architecture for trained fusion of steering directions
obtained from independently trained image based and rangefinder based sensing modalities.
Figure 12: Tremor and error correcting tool; protective end cap has been removed to make actuators visible.
Issues, problems, and techniques related to this application may be summarized as follows:
It is potentially difficult to distinguish tremor-related motion from error-related motion.
- How will the system distinguish surgeon error from surgeon intent?
Is it necessary to re-train the neural network for each surgeon? Does it have to be done
before each surgery, or is training fast enough that it could be done during the first few
minutes of "set up"?
How is it possible to assure stability of closed-loop control system in all application
scenarios?
200
Figure 13: PERCLOS measure as a function of time-of-night and recentness of driver rest periods.
201
Figure 14: Regions with corrosion (left) and cracks (right) identified
by neural network classification of wavelet feature vectors.
Issues, problems, and techniques related to this application may be summarized as follows:
How will it be possible to verify agreement between the results of automated inspection
and human inspection following government-mandated protocols?
The preprocessing provided by the wavelet decomposition is clearly important to
achieving efficient neural net classification - versus, e.g., presenting the whole image to
a neural network - but how is it possible to verify that the preprocessing does not filter
out valuable clues to the presence of defects?
- Given the range of possible defects - and the flexibility of the human visual and
judgment systems in identifying them how is it practically possible to obtain an
adequate training set, with an appropriate mix of defect and normal examples?
Is it possible to decide a priori whether a neural network or a fuzzy logic classifier is
better matched to a particular classification task? (See section 3)
9.2.6 Estimation of stability regions
The problem is the estimation of stability regions of autonomous nonlinear systems [25],
e.g., robots. The approach is to use empirical stability data to train a multi-layer neural
network - versus the usual differential equation model based analytical approach. The
methodology developed quantitatively characterizes regions of the control space with
stability estimates and their confidence intervals.
Figure 15: (left) Multi-layer neural network for estimation of stability regions in control space, and (right)
comparison of actual stability region, two conventional estimation models, and neural network estimate.
202
Issues, problems, and techniques related to this application may be summarized as follows:
What is the difference - besides terminology (or "spin") - between neural net and other
parameter-based approximation methods?
This solution uses a multi-layer (i.e., two or more hidden layer) solution. Is there a
systematic way of designing minimal architectures that can represent reality?
- Similarly, how can we decide when, how, and why to use and architecture that admits
or requires connections between non-adjacent layers?
9.2.7 Robot models for motion planning
This example involves an application superficially similar to the previous one: modeling of
motion planning for a robot with multiple degrees of freedom having nonlinear interactions.
However the issue in this case is not stability but path optimality [26]. Optimal motion
planning algorithms are well developed for electric motor drive robots, but they do not port
well to hydraulic robots. In the domain of interest, excavation and construction, optimality
translates directly into economics. The solution employs multiple neural networks to
model individual actuator response functions; the system model runs at about 75 times realtime, allowing the evaluation of multiple alternatives in anticipation in advance of
commanding any actual motion.
Issues, problems, and techniques related to this application are summarized as follows:
Issues common to modular neural networks, as discussed above.
- Although nonlinear, the actuator system range is well constrained both mechanically
and in terms of total available power and power available to individual actuators,
suggesting there may be an alternative analytical solution base.
A ubiquitous human-machine interaction problem is apparent: automating control of
complex multi-actuated systems with non-intuitive human interfaces.
Figure 17: (left) Illustration of the "car-on-the-hill" problem, and (right) neural network approximation to the
control surface.
Issues, problems, and techniques related to this application may be summarized as follows:
Issues are similar to those raised in other applications in which the neural network
technique is used to create a function-like connection between parameters and data.
- How does the result obtained compare, in actual structure and by various performance
measures, to numerical solution of the differential equations?
9.2.9 Learning human control strategies
The aim of this application is to learn how humans control a complex nonlinear
manipulator, and thereby to be able to incorporate human strategies in an automatic
controller [28]. A useful technology must incorporate validation of actual controller
performance for comparison with human performance and alternative control strategies. As
a practical matter, it proves appropriate to employ different learning strategies and models
for discrete and continuous time human actions. Measures of similarity and difference
between learned (neural network) and acquired (human) strategies are developed and
incorporated.
Issues, problems, and techniques related to this application are summarized as follows:
- Rationalization of mechanics and control: need for a general abstract model of
manipulator based on degrees of freedom, ranges, sensitivities, etc.
The high level model is matched to anticipated tasks, but the implementation's controls
are not. A valid engineering implementation may need to incorporate explicit - versus
implicitly human intuitive models - based on component strengths, economic
considerations, etc.
Should the emphasis then be on a (neural net?) "translator" between human (virtual)
and machine (actual) actuator controls?
Does the system react gracefully to unexpected circumstances and events?
204
Figure 18: Human experience-based control. (a) Monitoring and capture of human control strategies.
(b) Architecture of Human Control System (HCS) controller.
205
Figure 19: Finding pedestrians in city traffic, (left) Stereo based extraction of objects in the scene, (right)
Neural network based identification of pedestrians in various poses, states of motion, etc.
206
facial expressions (eyes, eyebrows, checks) and lower facial expressions (wrinkles, lips),
then these are fused to characterize the overall expression (happy, angry, afraid, etc.). The
paper cited focuses particularly on the details of the upper face module; this neural network
recognizes seven "upper face action units" that are parametric components of a vector space
that is classified by the upper face neural network.
Figure 21: Face recognition application. (left) Specification of upper face features. (right) Modular neural
network approach for upper face recognition, lower face recognition, and fusion of the two.
Issues, problems, and techniques related to this application may be summarized as follows:
- Are there still issues related to the completeness of the parameterization? There are
many different ways to parameterize faces (etc.); do faces (etc.) with similar parameters
look similar to human observers?
- Are there still issues related to the suitability of the parameterization? Are the
parameters (close to) "orthogonal" with respect to, e.g., human perception?
- Is it necessary also to include dynamics, e.g., to achieve natural-looking speech,
laughing, sneezing, etc?
9.2.13 Gesture-based communication
The goal of this project is to effect communication with "service robots" via natural and/or
defined human gestures [32]. The approach is a neural net recognition and interpretation of
the human's static pose, dynamic gestures, etc. Issues that need to be addressed to achieve
robust performance in real-world environments include lighting intensity and color tracking
of the human "employer". The system was demonstrated in a trash clean-up task.
Issues, problems, and techniques related to this application are summarized as follows:
Natural gestures have different meanings in different cultures; invented gestures put the
burden on the person vs. on the machine, thus negating the "natural interaction"
paradigm.
- How will the system distinguish a "gesture" from just randomly passing through a
"gesture state"?
Like ALVINN [Figure 10], this implementation uses multiple output units to encode a
single scalar value (e.g., direction); this is pragmatically effective, but why, e.g., versus
a single continuum output proportional to steering direction, is not intuitively or
quantitatively obvious.
Figure 22: Gesture and meaning, (top) Neural network based pose analysis,
(bottom) Map of the robot's operational range.
9.2.14 Summary
To summarize:
- Robotics is a lot more than robots:
Few "robotics" people ever see an anthropomorphic robot.
- "Robotic applications" are just applications.
- Neural nets in instruments carried by robots are just neural nets in instruments.
- Robot navigation, manipulation, etc., rarely require an instrument-like,
instrument-grade internal representation.
9.3. Case studies: neural networks for instrumentation and measurement systems in
robotic applications in research and industry
In this section we look in detail at two case studies applications of neural networks for
instrumentation and measurement in robotics:
- Robotic Enhanced Visual Inspection of Aircraft Skin [24,33]
Odor Detection and Classification Using Arrays of Relatively Non-Specific Sensors
[34,35].
Both applications are drawn from parts of projects that were done in the author's lab, the
first primarily with graduate student Priyan Gunatilake and other collaborators, and the
second with graduate student Huadong Wu and other collaborators.
9.3.7 Robotic enhanced visual inspection of aircraft skin
We touched on the robotic inspection of aircraft application in both section 1, where we
examined a vision-based robot navigation algorithm based on finding "lines-of-rivets" on
the aircraft skin (Figure 8 and reference [19]), and in section 2, where we briefly introduced
the topic of flaw inspection, particularly for cracks and corrosion (Figure 14 and reference
[24]). In this case study we will look at the latter application in more detail. Readers
interested in a complete description, including comparison with alternative approaches and
complete contextual material on aircraft inspection practice and problems, should see [33].
208
Figure 23: (left) Hangar environment for robotic inspection of aircraft, (right) Current practice,
inspectors in safety-harnesses on aircraft crown.
209
Figure 25: CIMP: (left) Sensor pod containing camera, diffuse illumination, dynamic spot
illumination, (middle) stereoscopic image pairs of lap joint (top), button head rivet line
(middle), and sample from defect library (bottom), and (right) inspector at stereoscopic
workstation.
Figure 26: Crack detection, (left) raw image with cracks indicated, and (right) processed
image with regions-of-interest isolated around rivets, crack-line-regions identified (green
in the original), cracks found and marked with measure of high confidence (red in the
original) and moderate confidence (blue in the original).
210
Figure 27: Corrosion detection, (left) raw image with regions of actual corrosion, surface
din, and painted skin marked, and (right) corroded regions identified with high confidence
(gray level image) and with moderate confidence (checkerboard gray level and black).
21 1
-, Q- channels partitioned
into 7 sub-bands
Figure 29: Wavelet decomposition for corrosion detection. The feature vector uses
components from the luminance and chrominance channels as illustrated.
Figure 30: Extension to multiple lighting alternatives, (top) three lighting alternatives;
(bottom left and middle) outputs under corresponding to lighting conditions; (bottom
right) block-by-block selection of highest confidence of three alternatives.
212
A small follow-up effort to extend the analysis to finding the optimum fusion of several
classifications is illustrated in Figure 30. Consistent with the human inspectors' practice of
examining the aircraft skin under multiple lighting conditions, we proceeded to (i) examine
multiple images with different lighting direction and directionality; (ii) define confidence in
terms of absolute difference between output and threshold; (iii) output is mosaic of
individual block classifications with highest confidence over the set of images covering
each block. Inspection of the figure illustrates the substantial improvement in detection and
removal of ambiguity obtained by this fusion technique.
9.3.1.6 Crack detection pipelines
Although crack detection and classification worked reasonably well using a neural network
scheme that paralleled the scheme outlined above for corrosion detection and classification,
a rule-based scheme implemented in a fuzzy logic framework worked substantially better.
For comparison, in this section we will thus outline the key features of the fuzzy logic
implementation. We will also attempt to discern why one of these problems seems to be
more amenable to a neural network solution, the other to a fuzzy logic, i.e., rule-based
solution.
First, crack-line features in the image were identified by a standard edge-finding
algorithm approach. Edges were then characterized by a five-component feature vector
- edge length: the number of pixels in the edge;
- propagation depth: the number of scales in which the edge is seen;
- edge shape: the RMS difference between the edge pixels and a robust straight line fit to
the edge pixels;
- edge type: normal or ridge edge type (see Figure 31);
differential intensity: a measure of line quality in which scratches yield negative
number and cracks yield a Ipositive number.
Discussions with visual inspectors of aircraft discerned many rules that they routinely
apply to distinguish among various line-like features on aircraft skins. Step-like features,
e.g., between painted and unpainted regions, are clearly neither scratches nor cracks, and
are easily eliminated both visually and algorithmically. Ridge-like features may be light on
a dark background or dark on a light background; in general the former are scratches and
the latter are scratches, but this cannot be guaranteed, since the appearance of a scratch may
change with both lighting angle and viewing angle. However a true crack is almost
invariably dark. The inspectors' rules may be summarized in this slightly simplified form:
if edge is dark and edge type is ridge then edge is crack;
- if edge is dark and edge shape is line then edge is crack;
if edge is dark and edge length is short or medium then edge is crack;
- if edge is dark and edge propagation depth is low then edge is crack.
213
These rules are encoded as fuzzy logic membership functions as illustrated in Figure 32.
The figure also illustrates the meaning of "light and dark ridge-type edges".
So we may now ask why fuzzy logic works better for cracks whereas neural networks
work better for corrosion? We don't know, but we feel we can offer some reasonable
speculation, and, based on this experience, some higher level guidance about what kind of
problems are most amenable to which techniques. It seems that corrosion is detected by a
relatively low level two-dimensional pattern matching process with little reasoning
involved:
<this feature vector> is located in <a particular subspaco of the space of feature
vectors that has been segmented into regions corresponding to <corrosion>, <no
corrosion>, <possible corrosionx
In contrast, crack-like features are best classified as being cracks, scratches, etc., via a
relatively higher-level semantic reasoning process:
... if <feature appearanco then <feature naturex..
involving the invocation of a set of rules that are conveniently implemented mechanically
via a fuzzy logic formulation.
Figure 32: Fuzzy logic, i.e., rule-based alternative to neural network classifier,
(left) membership functions; (right) illustration of "light and dark ridge-type edges".
214
e.g., explosives detection and identification. Among the enormous variety of chemically
responsive sensors and instruments available, we are particularly interested in metal oxide
semiconductor (MOS) chemically sensitive resistors. These 'Taguchi sensors" [35,36] are
in widespread use in many applications, especially in Japan, where sensing for residential
gas leaks is mandatory.
Tin is commonly the metal employed. Its oxide, SnO2, is a ceramic-like insulator. But
if the oxide is slightly reduced, to SnO2 E (where e is small) then the slight excess of metal,
i.e., free electrons, makes it an n-type semiconductor. Adding oxygen in the surrounding
gas phase environment causes to decrease, thus causing the material's resistivity to
increase, whereas removing oxygen - or adding a reducing (fuel) gas - causes e to increase,
thus causing the material's resistivity to decrease. Empirically the resistance of a MOS
sensor pretty well obeys the relationship R = R,, ([OJ/O+KJX]/ where R0 is a baseline
resistance, [O2 is the concentration of oxygen in the environment, [X] is the concentration
of a reducing gas contaminant in the environment, Kx is the rate constant for reaction
between X and O2, and B is a constant of order unity that depends on the particular sensor.
The good news is that that MOS sensors are inexpensive, rugged, and - for appropriate
sample types - exquisitely sensitive. The bad news is that there is no way for one of these
sensors to distinguish between a small concentration of environmental contaminant X for
which Kx is relatively large and a large concentration of environmental contaminant Y for
which Ky is relatively small. That is, their sensitivity is potentially high, but they have no
selectivity.
Fortunately there is an elegant solution: the detailed sensitivities to various
environmental contaminants can be modified substantially - albeit usually only empirically
- by changing the operating temperature, by adding trace quantities of various metallic
catalysts, or both. If we have two contaminants, X and Y, and two sensors, A and B, each
sensitive to X and Y but with somewhat different sensitivities, then from the responses of A
and B together we can calculate the individual concentrations [X] and [Y]. Similarly, from
N sensors each of which has a distinct pattern of response to N contaminants we can obtain
enough information to calculate all N concentrations.
Figure 33 illustrates a typical type of SnO2 sensor sensitivity dependence on target
chemical species - ethanol, methanol, and heptane - for two sensor temperatures. Notice
the shorter recovery time of the hotter sensor, as expected.
Since the relationship R = R0 ([O2]/(1+KX[X])P is nonlinear - and since KX the
sensitivity to X, often depends on the concentration of moisture or another contaminant Y in practice the concentrations of multiple simultaneously present components are rarely
easy to actually back out. These difficulties present us with at least three opportunities for
neural network solution:
- identifying the model parameters (b, Kx) that describe response to individual
contaminants as a function of concentration and temperature;
- calibrating multi-sensor systems in the face of cross sensitivity, i.e., the sensitivity to X
depends on the concentration of Y;
- learning the responses particular sensors and sensor arrays show to new types of
environmental contaminants, chemical warfare threats, odors symptomatic of health
problems, etc., without any requirement to redesign hardware or modify the architecture
of existing software.
Figure 34 illustrates, on the left, an assortment of Taguchi-type sensors. They are all
"homemade" [35] except for the commercial sensor in the middle column, last row (this is
one sensor with the protective cap removed). The one commercial device is a single sensor,
but all the homemade ones are in one way or another integrated arrays of multiple sensors.
The enlargement on the right shows one of these: the three different color shades in the
horizontal rows indicate that each row has been prepared with a different noble metal
catalyst. The dark vertical stripe at the left is a restive heater; there is thus a temperature
215
gradient decreasing from left to right across the device. By appropriate selection of
contacts, 25 different resistances can be measured, each characteristic of a particular
temperature and catalyst.
Sample : Ethanol
Sample : Meihanol
Sample ; Heptane
Sensitivity :
81.14%
100
Figure 33: Sensitivity to transient samples of ethanol, methanol, and heptane of two
chemically sensitive resistors, R17 and R13, essentially identical but R17 is at a higher
temperature. Horizontal axis is time, vertical axis is percent change in resistance from
baseline.
Figure 35: Classification and quantitation: (left) classification - output is one of two
components (dots in lower left and upper right corners); (right) quantitation: output is
fractional concentration of two components.
216
Rosenblatt F, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. 1961,
Washington DC: Spartan Press.
Minsky M, Steps Toward Artificial Intelligence. Proceedings of the IRE, 1960. 49: p. 8-30. Available
on-line at http://www.ai.mit.edu/people/minsky/papers/steps.html
Galkin I, "Crash Introduction to Artificial Neural Networks," 2001,
http://ulcar.uml.edu/-iag/CS/Intro-to-ANN.html
Hertz J, A Krogh, and R G Palmer, Introduction to the theory of neural computation, vol. I: AddisonWesley, 1991. (Santa Fe Institute Series in the Sciences of Complexity)
Lewis F L, "Neural Network Control of Robot Manipulators," in IEEE Expert Intelligent Systems &
Their Applications, 1996.
Dorst L, Lambalgen M v, and Voorbraak F, Reasoning with uncertainty in robotics: international
workshop, RUR95, Amsterdam, The Netherlands, December 4-6, 1995; Springer (1966).
Hebert M, C E Thorpe and A Stentz, Intelligent unmanned ground vehicles: autonomous navigation
research at Carnegie Mellon, Kluwer (1997).
Lewis FL, Jagannathan S and Yesildirek A, Neural network control of robot manipulators and
nonlinear systems, Taylor & Francis (1999).
Omidvar O and P v d Smagt, Neural systems for robotics, Academic (1997).
Pomerleau D A, Neural network perception for mobile robot guidance, Kluwer (1993).
Wilson E, Experiments in neural-network control of a free-flying space robot, NASA NTIS (1993).
Zalzala A M S and A S Morris, Neural networks for robotic control: theory and applications, Ellis
Horwood(1996).
Fogelman Soulie F and P Gallinari, Industrial Applications of Neural Networks, 1998. (From
ICANN'95 conference of the European Neural Network Society.)
IM
[14] Neural Network Applications in Manufacturing (compiled primarily by Stefan Korn, Glasgow
Caledonian University) http://www.emsl.pnl.gov:2080/proj/neuron/bib/manufacturing.html
[15] NN Reference - Robotics (books, classic papers, etc)
http://www.nd.com/nnreference/nnref-robotics.htm
[16] Marr D and T Poggio (1976). Cooperative computation of stereo disparity. Science, 194:283-287.
[17] Albus J and J M Evans Jr, sidebar in "Robot Systems", Scientific American, February 1976.
[18] Albus J, "A New Approach to Manipulator Control: The Cerebellar Model Articulation Controller
(CMAC), Journal of Dynamic Systems, Measurement and Control, American Soc. of Mechanical
Engineers, Sep 1975.
[19] Davis I and M Siegel, "Vision Algorithms for Guiding the Automated NonDestructive Inspector of
Aging Aircraft Skins", presented at SPIE Conference on Aging Infrastructures, San Diego CA, 1993.
[20] Baluja S, Evolution of an artificial neural network based autonomous land vehicle controller,
http://www.ri.cmu.edu/pubs/pub_3832.html and P Batavia, D Pomerleau and C Thorpe, Applying
Advanced Learning Algorithms to ALVINN, 1996, Robotics Institute, Carnegie Mellon University,
http://www.ri.cmu.edu/pubs/pub_423.html
[21] Davis I and A Stentz, Sensor fusion for autonomous outdoor navigation using neural networks,
Proceedings 1995 IEE/RSJ International Conference On Intelligent Robotic Systems (IROS '95)
1995, http://www.ri.cmu.edu/pubs/pub_3619.html and
http://www.ri .cmu.edu/pub_files/pub2/davis_ian_1995_2/davis_ian_1995_2.pdf
[22] Ang W-T, C Riviere and P Khosla. An Active Hand-held Instrument for Enhanced Microsurgical
Accuracy, Third International Conference on Medical Image Computing and Computer-Assisted
Intervention, 2000, http://www.ri.cmu.edu/pubs/pub_3511.html
[23] Grace R, V E Byrne, D M Bierman, J M Legrand, D Gricourt, B K Davis, J J Staszewski and B A
Carnahan, Drowsy Driver Detection System for Heavy Vehicles, 17th Digital Avionics Systems
Conference, 2001, http://www.ri.cmu.edu/pubs/pub_3644.html
[24] Gunatilake P and M Siegel, "Remote Enhanced Visual Inspection of Aircraft by a Mobile Robot," in
1998 iMTC Conference, 1998, pp. 49 - 58. http://www.ri.cmu.edu/pubs/pub_l316.html
[25] Ferreira E and B Krogh, Training Guidelines for Neural Networks to Estimate Stability Regions,
Proceedings of 1999 American Control Conference, 1999 June, v4 pp.2829 - 2833.
http://www.ri.cmu.edu/pubs/pub_3064.html
[26] Murali K and J Bares, Constructing fast hydraulic robot models for optimal motion planning, Field and
Service Robotics Conference (FSR '99), 1999 August). http://www.ri.cmu.edu/pubs/pub_2932.html
[27] Munos R, L Baird, and A Moore, "Gradient Descent Approaches to Neural-Net-Based Solutions of the
Hamilton-Jacobi-Bellman Equation," in International Joint Conference on Neural Networks. 1999.
http://www.ri.cmu.edu/pubs/pub_2623.html
[28] Nechyba M, "Learning and Validation of Human Control Strategies" (thesis), Robotics Institute
Carnegie Mellon University, 1998. http://www.ri.cmu.edu/pubs/pub_478.html
[29] Liang Z and C Thorpe, "Stereo and Neural Network-based Pedestrian Detection," presented at Int'l
Conf. on Intelligent Transportation Systems, 1999. http://www.ri.cmu.edu/pubs/pub_3317.html
[30] Romero R, D Touretzky, and R H Thibadeau, "Optical Chinese Character Recognition using
Probabilistic Neural Networks", 1996. http://www.ri.cmu.edu/pubs/pub_2962.html
[31] Tian Y-L, T Kanade, and J Cohn, "Recognizing upper face action units for facial expression analysis,"
IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), 2000.
http://www.ri.cmu.edu/pubs/pub_3625.html
[32] Waldherr S, S Thrun, and R Romero, "A neural-network based approach for recognition of pose and
motion gestures on a mobile robot," 5th Brazilian Symposium on Neural Networks, 1998, pp. 79 -84.
http://www.ri.cmu.edu/pubs/pub_3589.html
[33] Siegel M, P Gunatilake, and G W Podnar, "Robotic Assistants for Aircraft Inspectors," in IEEE
Instrumentation and Measurements (I&M) Magazine, vol. 1: IEEE Instrumentation and Measurements
Society, 1998, pp. 16-30.
[34] Wu H-D and M Siegel, "Odor-Based Incontinence Sensor," in IEEE Instrumentation and Measurement
Technology Conference (IMTC'2000). Baltimore MD: IEEE Instrumentation and Measurement
Society, 2000.
[35] Siegel M, "Olfaction, Metal Oxide Semiconductor Gas Sensors, and Neural Nets," in Traditional and
Non-Traditional Sensors for Robotics (NATO Advanced Workshop, Maratea Italy), vol. F63, T.
Henderson, ed., Berlin Germany: Springer-Verlag, 1990, pp. 143-157
[36] Taguchi N, US Patent 3 695 848, 1972.
Chapter 10
Neural Networks for Measurement
and Instrumentation in Laser Processing
Cesare ALIPPI
Department of Electronics and Information, Politecnico di Milano
piazza L. da Vinci 32, 20133 Milano, Italy
Anthony BLOM
Centre For Technology - Mass Products & Technology, Royal Philips Electronics NV
P.O. Box 218, 5600 MD, Eindhoven, The Netherlands
Abstract. Laser processing is in general a complex process, requiring a lot of
knowledge and experience for introducing and maintaining it in industry. This
"expert knowledge threshold" obstructs the acceptance of laser technology for new
applications. Introduction of process monitoring techniques in combination with
sophisticated data analysis tools and artificial intelligence has opened new options to
add self-tuning capabilities and closed loop feedback control to laser processing
equipment. Some very interesting work has been done in recent years by using soft
computing techniques to reach a new level of equipment performance in the field of
laser material processing. Advances have been obtained for different types of laser
processes, ranging from heavy industry seam welding in shipyard building down to
automotive, laser cutting of metal sheets and micro spot welding in the electronics
industry. Multi sensor process monitoring systems have been evaluated and their
(multi dimensional) outputs related to the process performance through soft
computing techniques. Sets of fast sensors are the basic elements to monitor the
process from which signal features are extracted and processed by composite
traditional/neural-based techniques to perform automatic classification of welded
and cut parts. The article presents a comprehensive presentation of the laser
processing technology, starting from the basic physics of the process up to a set of
industrial applications covering a large range of applications solved by the
interaction of traditional processing techniques and neural networks ones.
10.1. Introduction
Although laser processing is an accepted technology in industry for several years, the
activity is still surrounded with highly educated process engineers for maintaining process
performance and introducing new applications. To solve this problem, most of the selfrespecting laser equipment suppliers have an application laboratory to carry out dedicated
application development for their customers.
In order to make laser technology easier accepted in industry, there is a drive to
introduce self-tuning characteristics to the laser processing equipment, making the system
more robust to changing parameters in the process. Introduction of feedback techniques
based on information from the evolving process is necessary to achieve this goal. Some
form of process monitoring has to be introduced and the process signals have to be
analysed and related to the quality of the process. It is at this point where the strength of
220
sophisticated soft computing technologies is essential. The main problem here is that the
quality of the process operation cannot be measured directly with a straightforward
measurement, as is done with traditional feedback loops. Instead, the wished information
has to be derived from indirect measurements done on the process. Therefore, multi sensor
systems and/or sensor arrays (camera's) are envisaged to monitor the evolution over time of
the entities involved in the laser process.
Soft computing routines are indispensable to process the recorded data for identifying
the significant signal properties and finding the relations between the process and the
sensor signals since laser processes feature a relatively high degree of stochastic variation
in the sensor signals due to the partly chaotic behaviour of the actual fusion process. This
puts even higher demands on the signal processing techniques to achieve a certain
confidence level of the extracted information. Having the laser process operational, it
becomes interesting to know whether the processed artefact is good or not.
This quality analysis -most of times associated with a classifier development- is the
natural playground for Neural Network structures or similar techniques. Of course,
accuracy of the final solution to the application is a main goal to be pursued but not the
unique one. In fact, fast calculation routines are essential to prevent the process from
running into out of control conditions or, at a higher level, satisfy real time requirement.
The accuracy/constraints trade-off can be reached by exploring the solution space of the
application and investigating different solutions integrating traditional processing
techniques and parameterised ones.
The structure of the paper is as follows. Section 2 provides an introduction of the basic
physics and laser sources associated with laser processing. An overview of the most
relevant laser-based applications is given in section 3. Section 4 focuses on the design of a
composite traditional/neural network-based system solving the quality analysis problem.
Aspects related to candidate solution generation, training, validation, features extraction
and selection are addressed in detail. Finally, section 5 provides three applications related
to laser processing (seam welding, laser cutting and laser spot welding) in which the
authors have taken active part in designing the systems and developing a composite
solution. Such applications have been part of two European Union projects: the BriteEuram project "MAIL" (Multi sensor Assisted Intelligent Laser processing) and the IMS
project "SLAPS" (Self tuning user independent LAser Processing unitS).
10.2. Equipment and instrumentation in industrial laser processing
Laser material processing is a thermal treatment of the material. The energy flow is coming
from a very intense light source within a very limited wavelength range. This limited
wavelength range is obtained from the stimulated emission of light from a cavity, i.e., an
optical oscillator. The most important features of the light from these laser sources is that it
can be focussed to very small spot sizes, enabling very precise and local heating of the
work piece. Moreover, light can be manipulated very comfortably by using scanner mirrors.
Fast systems can be created in this way, as the dynamic behaviour of the scanning system is
the main limiting factor. Because there is no physical contact between energy source and
the work piece, there is a lot of design freedom in the machinery. Moreover, the contactless
treatment is of high importance for the ever-increasing miniaturisation in modem industry.
One of the features of the focussed laser beam is that it enables deep processing, which
opened the way to laser drilling, laser cutting and heavy industry laser welding.
10.2.1 Laser sources
A laser source is an optical resonator in which a certain amount of energy is kind of
bouncing between two mirrors as a light wave. The light is emitted by a medium between
221
the mirrors which determines the wavelength of the light (colour) because it is related to a
specific energy jump of the electrons of the excited medium. The medium is excited by
means of an external light source. The electrons of the excited atoms can fall back to the
lower level spontaneously or stimulated by the external excitation source as explained by
A. Einstein in 1915. The special thing about the stimulated emission is that these emissions
have the same wavelength, phase and polarisation as the source it was excited by.
From the time that the theoretical proof was given that laser could exist (1915), it took
about 45 years before one was actually build into operation [1].
Nowadays, there are quite a number of different laser sources; only those types, which
are used for the processes to be discussed, will shortly be described here.
10.2.1.1 CO2 laser
The CO2 laser was one of the first types to be developed and used for industrial
applications. The laser is of the molecular gas types, with a mixture of gases: He (65-80%),
N2 (15-22%) and CO2 (5-13%). Only the CO2 gas is responsible for the laser radiation due
to the positive fact that Nitrogen can be excited rather easily. This is done by means of a
gas discharge in the gas mixture exciting the nitrogen into a level which coincides with the
excitation level of the carbon-dioxide molecules. Energy is transferred from the N2 atoms to
the CO2 atoms with He acting as a catalyst.
The main advantage of this laser type is the very high efficiency of the laser, being
around 5-15%. It also has a very good beam quality. Due to the large wavelength of about
10 (im, the diffraction behaviour is courser compared to what we are used to in the visible
wavelength range: very small spot sizes are thus not possible with this laser type.
70.2.1.2 Nd:Yag laser
Nd:YAG lasers are solid state ion laser based on the excitation of a Neodymium (=1%)
doped Yttrium-Aluminium Granate crystal. The crystal is pumped with an optical source,
which can be a tungsten halogen lamp, a krypton arc lamp or a solid-state laser diode array
(AlGaAs). The pumping bands are in the region of 800 nm, after which several laser lines
exist (the 1064 line is the most important one). An often-used arrangement is an elliptical
cylinder filled with water for cooling the system. The Nd:YAG crystal rod is placed in one
focus line of the cylinder, while the pumping (flash)lamp is placed in the other.
The lasers can operate either in continuous wave mode (CW) or in pulsed mode, either
via switching of the pumping lamps or using a Q-switch (switching the damping of the
optical resonator). The emission will not start as long as the damping remains high. The
pumped optical energy will remain in the Nd:YAG rod. Immediately after switching the Q,
the accumulated energy will come available via stimulated emission in the form of a laser
pulse.
10.2.2 General laser processing aspects
For all applications in laser processing there are some general rules related to the
interaction between laser radiation and the work piece material to be processed [2,3].
10.2.2.1 Absorption, reflection, transmission
Radiation onto a surface will partly be absorbed by the material, partly reflected and partly
pass through the material. The actual interaction between laser beam and material is taking
place in a very thin upper layer of the work piece. Heat conductivity has to take care of the
distribution of the energy through the structure.
222
Figure 1: Laser processing phases; Heating (1), Melting (2), Fusion (3), Full penetration (4)
223
Laser cutting heads contain the focussing optics and a nozzle, which provides the
processing gas. The laser energy melts the material, while the processing gas blows out the
molten metal from the gap. Often a reactive gas (O2) is used, in which case the reaction
provides extra energy for the cutting process.
Most machines are meant for processing steel or steel alloys from about 1 mm up to 10
mm or even thicker. CO2 lasers are mostly used for these types machines because of their
high power output and high efficiency.
10.3.1.1 Most important process disturbances
There are many process parameters, which have to be set for a certain process. The most
important ones are: processing gas flow, cutting speed, laser power and focus position.
Modern machines are equipped with automatic focussing techniques, a technology,
which measures the distance between processing head and the work piece and maintains it
at a specific fixed value for that machine: this would normally not create problems.
Operators are always in a hurry and tend to obtain the highest possible cutting speed in
order to shorten the process time. In case the cutting speed gets so high that the processing
gas is not able to remove the material, the cutting process looses efficiency very fast. In
such a case the laser beam 'runs out of the cut'. Modern machines detect the threatening
danger of this situation by detecting the amount of plasma between work piece and nozzle.
When excessive plasma level is detected, the processing speed will be decreased for good
performing cutting. When the cutting speed gets in a critical high zone, the material
removal becomes unstable, which results in the forming of burrs and metal pearls on the
lower edge of the work piece and along the cutting sides. Although the cutting (separation)
is still achieved, the result can be unacceptable. Selecting a safe cutting speed is the
alternative, which means that the most economic cutting speed is not used.
224
Experts who are working with laser cutting machines have noticed that there is a
relation between the pattern of the sparks coming from the cut at the lower side of the work
piece and the quality of the cut. Although it seems pretty logical to look at the spark
pattern, the relation between cutting quality and spark pattern is not that simple. Besides
changing with work piece geometry, it also changes significantly with different materials.
The challenge is to find the relationship between the quality of the cutting process and
the behaviour of the pattern of sparks.
10.3.1.2 Process monitoring
The distance between work piece and processing head is measured using a non-contact
capacitive or in-contact inductive or resistive detection technique. A straightforward control
technique is implemented to drive the focus servo loop for optimum focus position.
Important for the proper operation of the capacitive sensing technique is that the amount of
plasma between work piece and processing head remains low, because it can have dramatic
effect on the impedance between nozzle tip and work piece. Plasma monitoring can be used
to identify the 'running out of the beam' in case the processing speed is too high for the
given process conditions. Immediate change of the cutting speed or laser power is the
remedy to proceed. Maintaining the process in such a condition that it is well away from an
unsafe or critical situation results in a good cutting quality while remaining at the highest
possible processing speed, would be the best. When it is possible to find a clear relation
between the sparks pattern and the cutting quality, this would be a good tool for
classification of the process quality as well as an option to control the process for optimum
performance. A standard CCD camera can be used for this purpose, in combination with a
frame grabber board for capturing and storing the images on a mass storage medium.
10.3.1.3 Automatic classification and options for control
Automatic control of the height of the cutting head above the surface of the work piece is
already state of the art technique. Recording of the plasma behaviour and optical emission
during the piercing and cutting processes is one important source of information about the
behaviour of the process and the related work piece quality. Analysing the 'spreading of
sparks' in relation to the quality of cutting is a very promising technique, both as far as
automatic classification of processed work pieces is concerned as well as for control of the
process conditions. Cutting speed and/or laser power are the most obvious parameters to be
controlled in this case.
10.3.2 Laser seam welding
Laser seam welding is a technique which is used over a wide range of industrial activities.
It ranges from fine seam welding of battery casings, pacemaker casings via seam welding
of automobile body parts, automotive engine- and transmission parts up to seam welding of
subassemblies for container vessels.
An often-used weld geometry is the but-joint, in which case the two parts that have to
be joined are placed next to each other in the same plane. Another geometry frequently
used when processing metal sheets is the overlap fillet weld or lap joint. One metal sheet is
placed over the other in this case, while welding at the edge of the top plate. Also the Tjoint is often used, which means that a plate is positioned (almost) perpendicular to a
second sheet and welded in both comers between these plates.
For seam welding, the local processing area is moved along the line where both work
piece parts meet. The most used laser systems for seam welding are CO2 lasers (heavy
industry and automotive) and continuous Nd:YAG lasers (low power seam welding). Also
225
seam welding by overlapping spot welds, using a pulsed Nd:YAG laser is applied in
industry. What makes laser welding so special is the ability to obtain a deep processing
depth due to the keyhole effect. This is very different from arc welding, which is the
biggest competitor on welding. Here the heat is mainly generated in the sub surface of the
work piece parts, where the largest portion of the current flows. With the keyhole in action
and the processing area moved along the seam, new material will be melted at the 'front' of
the keyhole, while material will cool down and solidify in the 'trail' of the keyhole. In this
way, the actual fusion takes place in the trail of the keyhole (see figure 3).
226
More sophisticated process monitoring techniques are being developed at this moment,
revealing information about the thermal distribution of energy through the work piece
structure. Either line array cameras or 2-D array cameras are used for this technology. The
recently introduced CMOS camera's are of interest here for their large dynamic response.
They can view the high intensity of the actual keyhole as well as the low level (near) infrared emission of the trail of the seam. Although the advantages of imaging technology are
there, the implementation in industrial set-ups is not always easy and certainly needs more
sophisticated processing.
10.3.2.3 Automatic classification and options for control
Classification of seam welds is nowadays often done on bases of the optical emission from
the keyhole area, by observing the behaviour of this emission over time. Deviations from
the nominal value exceeding pre-set values are identified as defective seam welds.
As mentioned in the previous chapter, control actions can only be implemented if the
cause of the process deviations can be traced down to a certain (set of) process parameters,
which have to be changed for better performance. The new (heat) imaging technique which
is coming up for seam weld process monitoring will need image analysis techniques to
extract the information about the performance of the process. Developments are in full
progress now on this topic.
10.3.3 Laser spot welding
Laser (micro) spot welding is a joining technology, which is frequently used for miniature
welds on small products. The typical characteristic of this joining technology is that the
laser beam remains focussed on the same spot while processing. The processing times are
short, in the order of 120 ms. The laser power during the short time is still considerable,
ranging from a few hundred Watts for thin stainless steel up to several kilowatts for copper
parts. Pulsed Nd:YAG lasers are mostly used for this type of joining technology
With the beam stationary on the same spot, the process runs very fast from the heating
phase into melting of the metal and soon after that to keyhole operation. The short process
time gives a low thermal loading of the work piece, which is one of the most important
features of this joining technology. The process runs rather stable on stainless steel products
due to the physical behaviour of stainless steel. For other metals the spot welding behaviour
can be much less stable. Copper for instance, is a material having a low absorption
coefficient for 1064 nm light (about 5% at room temperature), while the heat conductivity
is also very high. It is difficult to get the energy in, and when it is in, it is distributed very
fast through the structure: the surface temperature will only increase slowly. With
increasing temperature however, the absorption increases while the heat conductivity
decreases: energy will be accepted easier while it is diffusing slower through the structure.
This leads to an avalanche kind of process behaviour. Going into the key-hole type of
process phase gives another absorption increase, which means a change in process
behaviour. The only option which state of the art technology offers now is to use a very
carefully chosen pulse shape: the laser power changes over time during the laser pulse.
10.3.3.1 Most important process disturbances
The spot welding processes for stainless steel run very well with a stable process window.
The quality of the cutting and forming tools, leaving burrs and scratches on the parts and/or
not proper closing of folded metal sheets onto each other are the main process disturbances.
Pollution of the optics during operation can be a serious problem when no proper
actions, like preventive maintenance/cleaning, is implemented in manufacturing.
227
Variations in absorption coefficient and heat conduction (gap between parts) are
important factors in laser processing of copper.
10.3.3.2 Process monitoring
The whole process of micro spot welding is based on the absorption of the laser energy and
the distribution of this energy over the work piece geometry over time. A set of sensors is
used to monitor the performance of the process. The use of multi sensor process monitoring
is important here because the short spot weld process hardly comes into a stationary
situation: The heating-, melting-, fusion and cooling phase all have a significant influence
on the process result. Physical phenomena related to all these process phases have to be
monitored and evaluated.
Laser input power and reflected laser power are monitored to evaluate the in-coupling
of the laser energy. The infra-red emission from the weld spot is used to detect the
behaviour of the surface temperature via the T4 relation between temperature and emitted
power. The effect of plume emission, related to the evaporation of metal, is detected via the
optical emission from this plume.
The physical properties of a metal, like electrical resistance and magnetic permeability,
changes with temperature. This is the basis for using eddy current detection techniques to
monitor the penetration of the melt pool through a structure of metals.
It is well known among experienced laser equipment maintenance people, that it can be
heard by ear from the process, whether it runs properly or not, at least for certain types of
process conditions like proper focussing. Based on this information, also the acoustic
emission in the frequency range between 50 Hz and 20 kHz is detected, using a small
microphone. The image of the spot welded surface can tell a lot about the quality of the
weld by its appearance. Basic information like spot size, weld symmetry and presence of
spatters are important from the quality point of view and can reveal extra information about
the source of errors.
10.3.3.3 Automatic classification and options for control
Since several years, investigation is being done to find techniques which are able to classify
the welded products automatically through the evaluation of the recorded process signals
during the spot welding process and the image of the realised spot weld.
Adaptive control techniques can be used to cope with more or less slowly varying
process parameters like defocusing (thermal drift), laser power at the work piece
(pollution), absorption (material lot), slowly increasing gap between the parts (wear of
tools), etc. Some process conditions can change from weld to weld, like for instance the gap
between the parts, the absorption coefficient due to variations in surface conditions. Realtime feedback control of the laser spot welding process within each process time is the only
option to cope with this range of process condition variations.
The state of the art technique at present for spot welding of difficult materials like
copper and aluminium is to change the laser power during the process according to a fixed
pre-determined pattern. For copper welding, the power should be very high at the beginning
in order to get the process running, while it must be decreased rapidly as soon as the
melting phase is starting. Research activities are now being aimed at the development of
adaptive control strategies to have the system tune automatically to the most optimal pulse
shape, based on the evaluation of the process signals.
A real time control loop on top of this adaptive loop is needed to handle the instantly
acting process disturbances and keeping the process stable during the spot weld.
228
229
230
In real applications it is common to iterate the system partitioning procedure only for a
couple of iterations since more complex situations do not provide in general significant
improvements (the approximation ability of the complex composite system becomes
equivalent to a simpler one). An example of decomposition is given in figure 5a where we
denoted by T a traditional, specified, module which acts most of times, as a feature
extraction element executing a traditional computation. Figure 5b shows the same
composite system where the parallel modules moved before the T one and collapsed into a
single module M. Of course, in carrying out these operators the obtained composite module
must be feasible.
Figure 5: a) A composite system example; b) the composite system after a moving and collapsing operations
231
232
II
233
that several neural network families are universal function approximators. Based on this
example the reader can understand why composite system partitioning applied to the
original solution (figure 6a) can lead to simpler and computationally less demanding
solutions (figure 6e). During the training phase one should consider the most adequate
parameter tuning algorithm. On complex applications one should prefer the use of QuasiNewton derived algorithms instead of pure gradient-based simple procedures such as backpropagation since pure gradient descent algorithms become quite ineffective around the
minimum of the training function. Among the most efficient Quasi-Newton training
algorithms we encounter the BFGS, the DFP, the Levenberg-Marquardt along with their
recurrent variants. For a review of the different training algorithms please refer to [12].
While BFGS is surely the most effective training algorithm, it is also particularly time
consuming. The Levenberg-Marquardt algorithm is a nice compromise in these cases since
it is quite effective still requiring a reasonable computational burden.
10.4.1.6 Modules and system validation
Validation of the modules composing the systems, and by extending the approach the
whole system itself, can be accomplished with several validation techniques [7,9].
Surely, the easiest one is Cross Validation -CV- which partitions the N data set into
two subsets, the first one to be used for training the module, the second to validate it. For
ease of presentation, we apply the analysis to a classification problem. The accuracy
performance of the classifier (e.g., the number of correct classification evaluated on the
Nv,OK
validation set composed of Nv samples) is simply
where Nv,OK represents the
number of correct classifications in the validation set. It is undoubt that cross-validation
provides an unbiased estimate for the validation accuracy of the module. Nevertheless,
cross-validation suffers from a main disadvantage: the estimate confidence depends on the
available set and, if the number of validation pairs Nv is limited so it is the confidence
associated with results. In addition, cross validation is in contrast with a leading statistical
philosophy: saving data for validating the model reduces the data available for training and,
hence, the model is less accurate [7,9]. K-fold cross validation techniques can be
considered when there is a limited number of data and the complexity of the training
elements is not too high.
DATA SET
VALIDATION
TRAINING
VALIDATION
LOO:
#1
# Ntot
Figure 7: Cross validation and Leave One Out
234
Following this last comment we end up with different validation criteria which suggest
to consider most of available data to be used for training and just few -if any- for validation.
FPE, NIC, GPE criteria and Leave One Out -LOO- follow this principle [7,9]. LOO is an
interesting validation criterion and constitutes the bases for the feature selection core
presented in section 4.2. With LOO, given N samples we have to develop N classifiers,
each of which trained over a N-l samples sub-set and validated over the last untrained
pattern. The procedure now iterates for all the spare patterns and the performance estimate
is simply the ratio between the number of correctly classified patterns and the total number
of patterns minus one,
i.e.,
N,tot, OK
is presented in figure 7. The presence of a limited data set poses an additional problem
related to the confidence of the validation index. When we assert that validation
performance is x% we have to remind that the validation index is a random variable
depending on the particular realisation of the data: different data would have generated a
different validation performance. A confidence degree must then be introduced to grant, at
least in probability, what we are asserting.
For classification applications, the ones envisaged in our laser test-beds, we can assume
in primis that the generic trained classifier coincides with the best Bayes's one. In such a
case, depending on the number of data N, and the validation accuracy
with a confidence
of 95% [7] we have the situation depicted in figure 8. The entry point is which intersects
the two curves associated with a given N and we read the interval to which the real
validation performance error belongs at least with probability 0.95. Since we are assuming
that the considered classifier is optimal the approach provides the best possible solution.
Nevertheless, since we cannot guarantee that our classifier is the optimal one we can state
that, in the best case, our classifier will be characterised by the given performance interval.
0.1
0.2
03
0.4
0.5
0.6
0.7
08
0.9
10
235
236
The final choice for the sensors to be considered derives, of course, from the
background experience of the team involved in the set-up, available information and hints
from the related literature, economical issues and last, but not least, dimension of the
optical head. In many laser-applications, the laser head must be placed in a spatial
constrained environment and the sensors on it make it difficult the moving of the same
(wire problem, weight, robustness of the sensors once subject to strong forces due to
accelerations).
A methodology based on a sensitivity (features relevance) analysis encompassing KNN
classifiers has been developed to solve the feature extraction aspect. The basic idea
supporting the methodology is that a feature is relevant to a classification task if it provides
additional contribute to performance improvement. Unfortunately, the feature extraction
problem is NP-hard in the sense that all possible classifiers receiving all the possible groups
of features must be envisaged. Moreover, for each classifier we have to consider a training
phase which, by itself, is a time consuming approach. A methodology can be derived based
on KNN classifiers which solves the features extraction problem with a Poly-time
complexity in the number of features.
The heuristics make an approximation for the different actors for the features extraction
problem by assuming that the KNN classifier coincides with the best Bayes classifier (i.e.,
the analysis is optimistic). The assumption is supported by the fact that the KNN classifier
is a consistent estimate of the Bayes's one (when the number of data tends to infinity the
performance provided by the two coincides). The relevant advantage of a KNN w.r.t. other
consistent classifiers (e.g., feedforward and Radial Basis Function NNs) is that it does not
require a computationally intensive training phase. In fact, it is simple to generate a KNN
from a set of training data. To validate the effective performance of the obtained classifier
we considered a LOO validation technique at 95% of confidence. We can therefore state
that the performance of the classifier is reliable with high probability. This procedure
should be iterated for all the possible classifiers receiving all possible combinations of
features. A solution to this a problem is given by the following algorithm:
4:
5:
6:
The classifier maximising the LOO performance is the best one for the particular problem.
The interesting related effect is that the feature it receives are the most relevant ones to
solve the envisioned application and the sensors generating such signals should be
considered in the experimental set-up.
10.5. Applications
10.5.1 Laser cutting of steel/stainless steel
The Laser cutting of steel/stainless steel for a class of materials is a complex process, yet an
interesting application for its industrial and economical impact.
237
In the application, carried out in collaboration with TRUMPF GmbH, we observed that by
monitoring the evolution of the sparks jet, which is generated during cutting, we can extract
those information needed for quality analysis.
During the cutting process, non-optimal situations can occur which impair the quality of
the produced artefact. The two most relevant ones are:
- a discontinuous cut (there are segments of uncut material);
- pearls of metal (i.e., melted material which deposits and solidifies on the cut edges).
Three examples of sparks' jets retrieved during the cutting process from the bottom side
of the cut artefact are given in figure 9. Sparks jets vary both in intensity and shape and, in
some extreme cases, the main jet separates in two parts. The rightmost scene refers to a
pearl of metal situation; pearls are visible in the figure as hot spots of melted material
which deposit on the lower edges of the cut.
In this first application we apply in detail the whole methodology by focusing, in the
subsequent applications, on main results.
The first step of the methodology addresses the generation of a set of candidate
composite systems for the application starting from a simple high level design of the
solution. To this end, we note that the pearls of metal situation is completely different from
the other cases since here the sparks jet somehow degenerates. As such, we consider two
distinct solutions, one dealing with the identification of the pearls and the other with the
continuous/discontinuous cut.
In the pearls of metal case a pattern-matching filter tailored to the size/nature of the
pearls becomes a straightforward high level solution; the filter solves the specific problem
by classifying a sub-image as pearls free or affected. Since the filter's coefficients and
structure are unknown we have to consider a model-free module to be suitably identified.
Conversely, identification of discontinuous cuts is a more complex problem and
requires, a priori, identification of a set of features. Features must be related to the structure
of the sparks jet (e.g., some angles characterising the aperture of the jet) augmented with
external features such as cutting speed, type and pressure of gas used and thickness of the
material to be processed. It is reasonable to consider traditional modules for extracting the
internal features while an unspecified module acting as a classifier will process the features
to characterise the local quality of the cut.
The high level structure of the composite system is given in figure 10. Of course, other
composite systems could be generated from the first one by considering parallel and series
decomposition of the M modules as well as moving and collapsing operators. In particular,
we verified that a single composite system of T-M nature (Traditional following by a
parameterised model) can be considered to solve the problem. In this case the M module
receives also features related to the presence of pearls of material. Unfortunately, such
composite system, even if less computationally intensive, is characterised by poorer
performance.
238
Cut speed,
Gas Used,
Thickness
Good Cut
k.
image
T
Feature
Extraction
M
Discontinous
cut
M
Pearl
classifier
Bad Cut
Figure 10: The chosen composite system
The second step to be accomplished refers to model family selection for the model-free
modules. We considered feed-forward classifiers since they have been proven to be
universal function approximators. The specific neural computation e.g., sigmoidal- based or
RBF is not relevant at this step.
The third phase requires to generate a set of relevant features for the specific
application. Since features must characterise the nature of the sparks' jet we considered the
set of angles outlined in figure 11. In particular, the potentially interesting angles are the
inclination angle a of the core of the sparks jet, the aperture angle of the core of the jet
and the angle y characterising the opening of the whole jet. Please note that at this
abstraction level we do not know which features will be relevant to the process.
Identification of the features-angles is everything but an easy task since the sparks jet is
rather noise-affected (several sparks are outside the main core): this observation has an
immediate impact on the computational load and the complexity of the solution.
In particular, the main difficulty is associated with the identification of the origin of the
sparks, which represents the reference point for angle determination. The presence of a
significant noise in the image and the non perfect linearity of the sparks trajectory once
ejected make particularly complex the identification of such reference point.
Figure 11: the internal features extracted from the sparks jet
The high level steps leading to the identification of the reference point can be
summarised as follows:
- Preliminary identification of the reference point. The critical information is the
horizontal co-ordinate which can be estimated by identifying a "first significant
increment" of the luminous intensity of the spark's core, along the vertical direction;
- Application of the Radon Transform to compute the direction of the principal axis of the
sparks jet. The minimum value assumed by the variance of the projections indicates the
main jet's axis;
- intersecting the horizontal line and the principal axis;
- applying a Least Mean Square Technique;
- translate the principal axis to the jet starting point and obtain the a angle.
The and y angles can be then estimated with an additional processing of the image. In
particular, the steps to be accomplished require
- Median filtering
Image binarization
- Cumulate the intensity in rows
- Find left/right edges of the sparks jet
- Apply a linear regression on the left and right sides by imposing that angles must pass
trough the vertex.
Once the features have been extracted and input/output pairs generated, the next step
requires training the classifiers. The chosen training algorithm was the LevenbergMarquardt applied to a Mean Square Error training function.
10.5.1.1 Pearls classifier
The neural topology is composed of a simple network receiving a 15x18 pixel image, it is
characterised by two hidden layers with 12 and 6 hidden neurons respectively, and provides
the indication pearls/no_pearls. We considered a two layered neural network since two
layers can solve a complex application with a lower number of hidden units. The neural
network has been trained on a set of images containing pearls and non pearls situations
(some training examples are given in figure 12). In a way, the neural network behaves as a
non-linear pattern-matching filter which scans the image looking for potential pearls of
material. Once pearls are identified the associated cut at the instant of time the image was
retrieved by the camera is classified as bad. Conversely, we cannot guarantee that the cut is
error prone when the classification is no_pearls since other sources of bad classification can
be present.
240
Application Structure
Mild steel,
6mm thickness, O2
Stainless steel
3mm thickness, O2
Validation Error
46.7%
0%
6.1%
1.1%
Features considered
Pressure, cutting speed
Pressure, cutting speed, ,(5
Pressure, cutting speed
Pressure, cutting speed,
Before training the classifiers we run the features selection algorithm for each class and
we identified the most suitable features for each of them. Interesting enough, some classes
need solely the external features to solve the specific applications while other classes
significantly improved by also considering angles. As an example we consider two classes
with structure, features and results as given in table 1. We realise that an adequate choice
for the composite system and features is fundamental to solve the application.
By considering the a and angles the performance of the classifiers improve. Results
have been estimated over a large data set and, as such, the validation error is a consistent
estimate of the effective accuracy of the classifiers.
70.5.2 Laser seam welding of gears
In general, the quality analysis of a seam welding process is assessed by an offline
inspection of each welded component carried out with ultrasonic or X-ray devices. A
different approach envisages an on-line quality analysis which is implemented directly
during the welding process. By following this principle we address the laser welding
quality analysis with a composite system that detects defects on-line directly during the
welding phase. The industrial process under monitoring refers to the laser welding of
automotive components carried out at the CRF-FIAT laboratories. The specific test bed is a
steel gear, a critical part in the gearbox for a passenger vehicle obtained by joining the two
rings composing the gear with a CO2 laser.
II
241
Small power changes once the keyhole has been formed can cause remarkable changes
in the weld results while the presence of non-metallic contaminants may produce spatters
and porosity in the welded region. In the considered application we wish to identify errors
associated with
- Porosity (spontaneous and caused by misalignment or power lack);
- Decrease in laser power level (up to 10% of the nominal value)
- Mounting errors (the two pieces to be welded are misaligned)
The signals acquired during the welding process are the laser power signal (figure 14
left) and the infrared radiation (figure 15 left) coming from the welding process.
Due to the nature of the application we simply opted for a T-M composite system for
each problem.
The features relevant to the process have been obtained by inspecting signals coming
both from correct and anomalous operations.
We discovered that the laser power signal possesses enough information to detect power
decrease or power lack. Conversely, the signal from the photodiode is suitable for carrying
out the quality analysis for porosity formation, detection and individuation of misalignment
errors.
In particular, and referring to figure 14b, we see that a power decrease is visible. As
features, we extracted the power decrease (F) and the time duration (T) associated with the
part of the power signal above its mean value.
242
To identify the features associated with the infrared radiation we applied a low-pass
filter to remove high frequency components in the signal and a cubic interpolation of the
signal or reference signal.
The features to be extracted, and interesting to the porosity formation, are now the
deviations from the reference signal, i.e., the time duration (A) and the amplitude (D) of the
main deviations. The feature selection phase validated the choice for the features.
The misalignment of the parts to be butt-welded can be identified by processing the
infrared signal. A characteristic signal measured in presence of a misalignment is given in
figure 16.
We extracted as relevant features, the index H and L which represent the amplitude
between the two stationarity points and the corresponding time interval, respectively. Such
features have been extracted on the cubic interpolation to reduce the computational burden.
The final classifier has been trained to solve each class of errors based on a set of goodno_good welding experiments.
To compare performance in classification for this test-bed we considered both a one
hidden layer feed-forward neural classifier and the KNN one. Results are given in table 2
where FF-NN have been validated with Cross-validation while the KNN with the LeaveOne-Out validation technique.
Since the application is characterised by a significantly small number of samples, we
introduced the accuracy intervals as suggested by figure 8 (confidence level of 0.95).
Table 2: KNN and FF-NN classification performance in the seam-welding of gears
Power
decrease
Mountin
g error
Porosity
presence
Classifier
Training
samples
CV/LOO
Samples
Validation
Error
KNN
69
68
FF-NN
48
KNN
Notes
0%
Accuracy
interval
0-8%
21
0%
0-8%
4 hidden
units
55
54
1.8%
0-10%
FF-NN
39
16
0%
0-8%
KNN
215
214
0.35%
0-4%
FF-NN
199
86
0%
0-4%
2 hidden
units
4 hidden
units
243
We can see that the best neural classifier has a maximal complexity of 4 neurons,
noticeably lower once compared to the complexity of the corresponding KNN classifier
which requires to compare the actual pattern with each training one. We note that the feedforward Neural networks always provide a 0% of validation error.
10.5.3 Automatic classification of laser spot welded electron gun parts
Micro spot welding is being used at the production site of electron gun parts for many
years. The advantages of low thermal and mechanical loading of the product are the main
issues for using this joining technology. The production plant of Philips electronics in
Sittard, has over 100 lasers in use to assemble the parts for the electron guns and the final
assembly itself, which has about 120 laser spot welds on each finished product. Several
types of spot welds are used in this production facility, of which the overlap penetration and
the overlap fillet weld are the most important. Figure 17 shows a schematic overview of the
electron gun, generating the electrons, focussing them to a narrow beam and accelerating
the electrons for their travel to the screen.
244
Since several years, investigation is being done to find techniques, which are able to
classify the welded products automatically through the evaluation of the recorded process
signals during the spot welding process. This case study describes some of the major results
obtained over the recent years on this topic, both with traditional process monitoring
techniques as well as with sophisticated Neural Network based automatic classifiers.
The goal of the activities was to develop an automatic classification tool, recognising
defective welded products on-line, based on the evaluation of measured process phenomena
during the welding process. The reliability of the classifier should be high:
- A few percent good products in the lot of products classified as bad is acceptable.
Only a few ppm of bad products in the lot of products classified as good is accepted.
Training of the automatic classifier should be as simple as possible, enabling easy
implementation in production environment.
Defining the quality of a laser spot welded joint requires more then a simple description.
Several characteristics of the joint have to be taken into account to describe the quality.
Although the electron gun parts as such are mechanically static products, the
mechanical demands toward the product parts are important due to high thermal loads on
the products (cyclic temperature stressing). The spot welds for the electron gun parts must
meet a set of demands to be acceptable:
- Good penetration depth, identified by recognising a certain de-colouring of the material
on the bottom side of the work piece.
- Good alignment of the joined parts, the gap between the parts should not be too large.
- No spatters of steel particles in the vicinity of the weld.
Because of the variety of parameters to be taken into account, it is not enough to
evaluate these parameters on basis of only one sensor. A set of sensors is selected as bases
for the evaluation of the quality parameters.
10.5.3.1 Multi sensor process monitoring
The micro spot welding process can be divided into three successive process phases:
Heating, melting and fusion and cooling. The work piece surface temperature is increased
from room temperature up to melting temperature during the heating phase. The used laser
beam power and the absorption coefficient of the work piece material are the most
dominant parameters, which influence the evolving process here. The input laser power and
part of the reflected laser power is detected to have information about these parameters.
The surface temperature, the amount of metal evaporation and the penetration depth are
important factors during the melting and fusion phase. The infra red emission from the
weld spot, the optical emission from the plume and the change of induced eddy currents are
detected to monitor these process parameters. Surface temperature is also an important
parameter during the cooling phase.
It is of essential importance to note that there are several combinations of process
parameters, which will give a quite different process behaviour but will lead to good
welding quality in these cases. This means that good spot welding quality is not uniquely
related to a specific narrow bounded process parameter combination. It is possible to select
another combination of process parameters (for instance using a longer process time with
lower laser power for having less evaporation of metal), will lead to different process
signals, while maintaining the welding quality. This makes it even more important to use a
broad set of sensors for the process-monitoring task.
The set of signals used for the final test of automatic classification was:
1. Laser input power
2. On-axis reflected laser power(back-reflected into the aperture of the welding set-up)
3. Off-axis reflected laser power(diffuse reflected laser power)
4.
5.
6.
7.
8.
II
245
Plume emission
Variation of the angular orientation of the plume w.r.t. the surface
Surface temperature (logarithmically amplified visible emission, Silicon sensor)
Surface temperature (logarithmically amplified i.r. emission, Germanium sensor)
Sonic acoustic emission (microphone)
Figure 18: Example of some signals and features extracted from these signals
The general approach for the classification problem is to extract features from the
process signals and to compare this set of features with sets of features of welds who's
quality levels are known. Figure 19 shows the approach via a schematic overview of the
used functional blocks.
246
Figure 19: Approach to automatic classification of the quality of micro spot welding
247
done under various combinations of process parameters. Process parameters have been
varied over the range we expect to face in industry.
The complete set of features extracted from the data files in combination with the offline classification give us the reference patterns for the process. This completes the
'training' process for the KNN classifier.
Testing of the classifier is done on basis of the 625 available experiment data sets with
known classification. A number of steps have to be made to test the performance of the
classifier, which can be divided into three different parts:
1. Loading of the original recorded 'MAIL' data file
2. Extraction all features for all data files
3. Selecting the appropriate features (based on the feature selection results)
4. Normalisation of the features (to unity variance and zero mean)
5. Splitting of the data set into a part used for training and for testing
6. Reading the results from the manual classification (excel file)
7. Invoking the classifier to classify the 'test-set'
8. Verification of the performance of the classifier by comparing its output with that of the
manual classification.
A general conclusion from the work in 'MAIL', is that the results are encouraging but
not reliable enough yet for industrial implementation.
The results on our classifier testing showed a level of 98 % correct classifications and 2
% incorrect classified cases for the INN classifier. The 2NN classifier showed a
performance level of 95.5% correct classifications and 1% incorrect classifications. In 3.5%
of the cases the classification did not decide. Within the 1% incorrect classifications were
also 'Bad' welds which had been classified as 'Good', a situation we would like not to
happen. Of course, since the number of data is finite a 6-8% accuracy interval must be
considered and centred around the nominal accuracies obtained.
References
[1]
[2]
[3]
[4]
[5]
249
Chapter 11
Neural Networks
for Measurements and Instrumentation
in Electrical Applications
Salvatore BAGLIO
Dipartimento di Ingegneria Elettrica Elettronica e Sistemistica, University of Catania
V.le A. Doria 6, Catania, 95125 Italy
Abstract This chapter gives an overview of the use of soft computing
methodologies in measurement systems for electrical quantities, although the
presented approaches can be extended to deal also with other quantities whenever
these quantities are first converted into electrical ones. The basic electrical
properties of materials (e.g., resistivity and permittivity) and the methods for their
measurement are introduced. Then, a brief discussion on the soft computing
methodologies that can be used for measurement systems will be performed. Finally,
some real applications of soft computing technologies in measurement systems for
industrial applications are presented.
250
approaches to measure these changes are very important from a measurement point of view.
This leads in fact to relate the changes to the physical phenomena inducing them and, in
turn, to find ways to indirectly measure these phenomena.
An analytical model that relates the electrical resistivity to all physical quantities
influencing it can be derived from the simple system shown in Fig. 1. The bar is made of an
isotropic and homogeneous material; its length, width, and height are l, w, h, respectively.
The voltage difference V is applied across its length and the current / flowing in the bar is
measured.
In the bar of isotropic and homogeneous material the electric field E produced by the
voltage Vis:
[V/m]
(1)
J=
A
[A/m2]
(2)
The relation between the current density and the electric field leads to the resistivity
definition given above:
E = pJ
(3)
Therefore, it is:
The resistance of the bar is therefore defined by including not only its electrical
properties but also its geometrical dimensions. The resistance, measured in Ohm [], is:
A
In general, the resistance R depends on the size and the shape of the object, while resistivity
p depends only on properties of the object material.
Finally, the proportionality between the voltage applied across the conductor bar and the
resulting current flowing in the bar itself is stated by the well-known Ohm law:
251
m
P=
ne2 1
where m is the mass of the electron, n is the number of charge carriers, T is the average time
between two collisions of the free carrier with the stationary atoms of the material, and e is
the electron charge. Equation 7 can be useful to understand, for example, the behavior of
thermal sensors based on the change of resistivity. The resistivity of metals usually
increases with temperature since T decreases, while the resistivity of semiconductors
usually decreases with temperature since n increases.
Measuring the resistivity (or the resistance) is not always easy: in general, it is in fact
rather difficult to make good electrical contacts in order to get good measures. In the "twopoint" measurement scheme shown in Fig. 1, contact resistances may easily lead to
meaningless measurements.
A well-known method to measure the resistivity (the "four-point technique") is based
on separate voltmetric and arnperometric measures, performed by using two separate
circuits (Fig. 2). The separate voltmetric circuit allows for neglecting the effects of the
contact resistance in the evaluation of the resistivity. In fact, if the input impedance of the
voltmeter is large enough, a very small current flows through the voltmetric circuit and,
thus the voltage drop across its contact resistances can be neglected. The high currents in
the arnperometric circuit generate undesired effects as in the two-point technique: however,
in the "four-point" approach, the contact resistances are located outside the region of
voltmetric measurement and, consequently, do not affect the voltage reading.
V w h
(8)
(8)II'
This relationship holds when the physical dimensions of the objects are large enough with
respect to the measurement area. Excessive currents should be avoided, although high
voltage readings are desirable to increase the reading accuracy, since heating can affect the
measure.
11.1.2 Permittivity measurements
Dielectric materials have relatively few free charge carriers: most of the charge carriers are
in fact bonded and cannot participate to conduction. Therefore dielectric materials have
high resistivities, typically of the order of 10I5-1018 [Qm].
252
However, an external electric field can displace the bonded charges. Atoms or
molecules form electric dipoles that tend to oppose to the applied electric field. A dielectric
material that exhibits nonzero distribution of these charge separation are called polarized.
The volume density of the polarization P describes the volume density of the dipoles.
For a linear, isotropic material, the polarization density is related to the applied field E:
P = 0XeE
(9)
where ,,=8.854 1012 [F m'] is the permittivity of vacuum, and xe is called the electric
susceptibility of the material.
The electric flux, or displacement, D is defined as:
D = e0E + P = e 0(l+xe)E = e0erE
(10)
where
is the permittivity of the material, and er is its relative permittivity (or
dielectric constant).
A material whose characteristics depend on frequency is called dispersive. For timeharmonic fields (e.g.,
), the generalized Ampere law is, in the phasor form:
= Je + J + jwD
(11)
where H is the magnetic field intensity [A/m], Je is the source current density [A/m2], J is
the conduction current density [A/mr], and jwD represents the displacement current
density. ]e will be zero for a source-free region.
Being J = a E, where a is the conductivity of the materials [S/m], it is:
E+jwe E
(12)
Conduction current represents the loss of power. In dielectric materials there is another
source of loss. When a time-harmonic electric field is applied, the dipoles flip back and
forth continuously. Since the charge carriers have finite mass, the field must perform work
to move them and, moreover, they might not respond instantaneously. The polarization
vector will lag behind the applied electric field.
(13)
I+(JWT) 1 - a
where
and are the relative permittivity at infinite and zero frequencies, respectively
and is the characteristic relaxation time in seconds. For o=0 Eq. 15 is the Debye equation.
The above parameters significantly change both with material and frequency. From the
measurement point of view, these changes are useful to realize capacitive sensors, for
253
example to estimate the nature of a material. As an example, Tables 1 and 2 report the
dispersion parameter and the complex permittivity for some frequencies and some
materials.
Table 1: Dispersion at room temperature
a
T[ps]
5
water 5
0
8.0789
78
24
127.8545
ethanol 4.2
0
acetone 1.9
21.2
3.3423
0
(16)
In the case of guard electrodes (Fig. 3), often used to reduce fringe effects, the smaller plate
must be considered to evaluate C0.
(17)
The first term represents the charging current of the upper capacitor: this current is not
measured. The second term is the charging current of the lower branch of the equivalent
circuit. The time constant 1 can be determined and the resistance R can be estimated by
extrapolating the curve for t =0.
11.1.3 Permeability measurements
A magnetic field interacts with any material that is immersed in the field itself. The
magnetic field is usually visualized by means of "flux lines" (or "lines of force"): when
these flux lines encounter any material, they are reduced or increased by the interaction
254
between the magnetic field and the material. The original magnetic field is modified
(amplified or attenuated) in the body of the material as a result of this interaction.
The magnetic permeability of a material describes the intensity of this interaction, i.e.,
the degree to which a material can be magnetized. Materials have different degrees of
magnetization:
- ferromagnetic materials are highly magnetizable materials that strengthen the magnetic
field (e.g., iron or nickel),
paramagnetic materials are weakly magnetizable materials that increase the magnetic
field only marginally (e.g., Al),
diamagnetic materials are "negative magnetizable materials" since slightly weaken the
applied magnetic field (e.g., Cu, rare gases).
When a magnetic field H [A/m] is externally applied to an object, the field magnetizes
the object to a degree M ([Wb/m2]) while passing through the body of the object. The
combined effect of the applied magnetic field and the object magnetization produces a total
flux density B, called magnetic induction (measured in Wb/m2, or Tesla T).
B=
u0H+M
-7
(18)
-1
-1
In any atom the electrons orbit and spin, thus behaving like very tiny current loops. As
for any moving charged particle, a magnetic momentum is associated with each electron.
Diamagnetism occurs when the total momentum obtained by adding the contributions of all
electrons is null. The magnetic field applied to a diamagnetic material induces can induce a
momentum in the material that opposes the applied field.
In a paramagnetic material the total momentum generated by all electrons of an atom is
not null. When a magnetic field is applied, the weak diamagnetic response is dominated by
the atom tendency to align its own momentum with the direction of the applied field.
Diamagnetic and paramagnetic substances are characterized by their magnetic
susceptibility K [Wb A-1 m-1]
255
R3=R0(l+x)
D
Figure 5: Wheatstone bridge circuit for resistive sensors conditioning
The unknown resistance R3 (the unknown relative resistance change x with respect the fixed
value R0) is directly derived from R4 through the scaling factor R 2 / R 1 .
If the unknown resistance has to be continuously monitored in time to observe its
variations (and, consequently, the variations of the physical phenomena inducing the
resistivity changes), verifying the bridge balance condition (V0=0) could become a severe
problem.
In these cases a deflection methodology can be adopted. The output voltage can be
expressed as a function of x as follows:
(24)
In the linearity region of the operational amplifier, this conditioning circuit is linear with x,
without any restriction on its amplitude.
256
Figure 7: Scheme of the AC bridge for signal conditioning for reactive sensors
For this circuit the following general relationships hold:
V
V
V
AB =-V
V AC -V
BC
V
-V
V
AB = V
(26)
Z1+Z2
Z 3 +Z 4
At the equilibrium, the following relationships hold among the moduli and the phases of the
four impedances:
Z11ZZ4 4=Z
Z
= 2ZZ2Z3
(27)
AB
Two of the four elements are usually fixed and used as scaling factor, to evaluate the
unknown impedance the remaining one can be then varied until the equilibrium condition
on the output voltage is reached. At this point the conditions reported in Eq.27 hold and the
unknown impedance can be estimated. As an example, taking Z1 as unknown, in the
following two among the possible alternatives are considered:
R1 = kR2
X,=kX2
R1=-kR2
Xl=kR2
(28)
257
(29)
In the fuzzy environment a membership function is allowed to assume any value (degree of
membership) in the real interval [0,1].
Fuzzy sets can have any suitable shape and are defined on a Universe of Discourse that
considers the variable itself. On fuzzy sets a suited mathematical theory has been
developed: several operations can be performed, e.g., sum, product, and other Boolean
functions.
Linguistic variables are defined by fuzzy descriptions, each providing the membership
degrees to a fuzzy set defined in the Universe of Discourse U. Fig. 9 shows an example
concerning the definition of temperature.
Fuzzy rules allow for representing dependencies, by means of if-then rules like:
if <antecedent> then <consequence>
258
For example:
iftemperature is Ar and pressure is Cr then heating is BH
where temperature, pressure, and heating are linguistic variables; Ar, Cp, and BH are
linguistic values derived from the fuzzy sets defined on the universes of discourse of the
variables.
Systems are described by the set of fuzzy rules (fuzzy rulebase). Fuzzy inferences are
necessary to determine the actual output for a given input. In Fig. 10 the fuzzy rule base for
a hypothetic temperature control system is reported.
The rulebase inference consists of several steps. First of all, the degree of membership
for each term of an input variable is determined. Then, the degree of fulfillment for the
entire antecedent is computed by using a "fuzzy AND". Finally, the degree of membership
of the antecedent is applied to the consequent of the rule by using a suited rule (t-norm),
namely the min or the prod operators. These steps are summarized in Fig. 11.
140
T=140Cis0.7high,0.3
medium and 0 low !!!
T[C]
259
Temperature
Fuzzy Rulebase
Pressure
CL
lowT
medT
highH
highH
medH
lowH
lowH
lowH
highT
medH
lowH
lowH
Heating
The final step is defuzzification. Consequents are aggregated by using the max-operator
to implement union. Then, a crisp output value hcoc is derived from the output membership
function. This last operation is performed by using, for example, the method of the center
of gravity (see Fig. 12):
hCOG
where, for each fuzzy set i, ui- is the degree of membership, Ai is the area, and
center of gravity.
(30)
is the
260
Different rules can be defined according to the consequent form. The Mamdani rule
produces fuzzy sets (as in the example shown in Fig. 12). In the Takagi-Sugeno-Kang
(TSK)-rule the outputs are functions f(xi) (e.g., if Xi is A1 and X2 is A2 .... Xn is Am then
Y=f(xpx2..xn)).
A fuzzy algorithm is therefore defined by using a mixed approach that merges operator
experience (empirical knowledge) with map learning (learning from set of experimental
data). This allows for achieving a much higher flexibility with respect to the neural
networks in merging empirical knowledge with experimental data.
A fuzzy algorithm can be summarized in the following steps:
- choose the input variables, i.e., measured quantities that are directly related to the actual
measurand,
- choose the membership functions and their shape for antecedents,
- choose the consequent membership function or the output rules functions,
- tune the consequent values from a set of measured examples,
- adjust the rulebase by adding some rules derived from the experience.
11.2.3 Neuro-fuzzy networks
Neuro-fuzzy networks are architectures similar to neural networks that are suited to
implement an optimized fuzzy system. They typically consist of a five-layer structure; an
example related to the case of two input quantities is shown in Fig. 13. The membership
function parameters in layer 1 and the consequents in layer 4 are determined by using a
learning algorithm. Layers 2, 3, and 5 are fixed and perform fuzzy inferences. Membership
function shape is chosen in advance (e.g., the Gaussian function, characterized by center
and variance).
Layer 3 Layer 4
Layer 5
261
Layer 1 :
Every node i is an adaptive node,
- O },i is the membership grade of a fuzzy set A (A1, A2, B1, B2); it specifies the degree to
which the given input satisfies the corresponding attribute A.
The parameter set that characterizes each membership function for the fuzzy set are
referred to as premise parameters.
i = l,2
i = 3,4
Layer 2
- Every node is a fixed node that represents the fire strength of each rule (AND, T-norm):
i
=
1,2
(32)
Layer 3
- The outputs are called normalized firing strengths.
Every node is a fixed node that computes the ratio between the i-th rule's firing strength
and the sum of all rules' firing strengths:
O3 i. = wi =
w
wi
w1 + w2
i = 1,2
(33)
Layer 4
The parameter set is called consequent parameters.
Every node is an adaptive node with function:
04,i = wifi = wi2 ( Pi x + qiy +ri)
(34)
Layer 5
- The node is a fixed node that computes the overall output as the summation of all
incoming signals:
262
y(k-1)
y(k-2)
Fuzzy
System
y(k-n)
y(k)
u(k-l)
u(k-m)
Rt :lF(y(k -1) IS A1J )AND (y(k - 2) IS A12 )AND... AND (u(k - m) ISAjm+n)THEN y(k) = yj
Figure 15: Fuzzy system modeling
Once the membership function both of the input variables u(k) and y(k) of the fuzzy
model and their regressions (i.e., the previous values to be considered to describe the
system dynamics) have been defined, the antecedent of each rule is completely specified for
each input. The unknowns to be determined for system identification are the rules' output
values y* (j=l,,., R, where R is the number of the fuzzy rules Rt that are used to represent
the system model). For each set x of the input values, the output Y* of the fuzzy system
(i.e., the fuzzy model output) is a linear combination of the rule outputs:
R
7=1
263
As any model, also a fuzzy model needs to be validated by using a set of data that has
not been used for configuration. In the case of nonlinear systems, the validation data set
should include examples of all system behaviors considered in the learning data set,
although examples must be different from the one used for learning. The statistical
properties of the output error must be analyzed to certify the quality of the identified model:
the error should in fact have a Gaussian distribution.
Soft sensors are therefore models of actual sensors that have been realized by using soft
computing methodologies. These measuring systems are useful for substituting or
cooperating with the real sensors.
11.3. Industrial applications of soft sensors and neural measurement systems
In this section some applications of soft computing methodologies to industrial cases
involving electrical measurements are presented. The basic goal is to show the effectiveness
of this approach to realize highly efficient sensors and measurement systems. Efficiency is
mainly related to opportunity of using simple sensors that do not need sophisticated and
expensive analog signal conditioning circuits, while derivation of the desired information
from the measurement signals is left to the generalization abilities of neural and
neuro-fuzzy systems. Additional information and examples are available in [8-12].
11.3.1 Neural measurement systems for electrical motor modeling
Asynchronous machines are very interesting both from an academic point of view and for
the industry since they have many applications. Unfortunately, these systems are very
complex, non-linear, and difficult to model and control.
Control strategies based on the flux are difficult to be used since direct and accurate flux
measurement is not feasible. A model of this machine and a device to observe the flux (e.g.,
a soft flux sensor) is therefore needed. Neural networks can be effectively adopted to build
a NARMAX (Nonlinear Autoregressive Moving Average with Exogenous Input) model of
the asynchronous machines and to design nonlinear flux observers (for system modeling
and control see chapters 4 and 5, [13]).
In the literature a fifth-order d-q model is shown suited to represent the behavior of the
asynchronous machine. In the state space form the machine equations are:
dt
dt
dt
X X XX
r k
XX
r
XX
rXk
XkI
V
Xr
where dr and qr are the rotor fluxes; r and wb are the rotor and the base electrical
angular velocity, respectively; e is the electrical supply frequency; TL is the load torque; /
is the system inertia; Rr is the rotor resistance referred to the stator; Xs, Xm, and Xr are the
stator self, mutual, and rotor reactance referred to the stator, respectively; p is the number
264
A white noise signal with maximum value equal to twice the working point was adopted to
produce the learning signals.
For validation two set of data have been used: signals not considered during learning,
and output signals obtained with zero input but nonzero initial conditions. This second
operating condition was not explicitly included in the learning set. It must be highlighted
as, during validation and autonomous operation, the outputs of the neural networks are fed
back to provide the signals for the subsequent iterations.
In Fig. 16 the validation results of the output ids are shown for these two operating
conditions, for the case of three separate networks corresponding to the three separate
outputs with 6 hidden units. The model describes adequately the system behavior with the
white noise input since the maximum error is less than 10% (Fig. 16a); however, the
adopted neural model has not sufficient generalization abilities since the error with nonzero
initial conditions (Fig. 16b) is too large.
By pruning the hidden layer was optimized and reduced to 3 units, thus removing
unnecessary degrees of freedom that were not properly configured by learning: validation
showed now a much better model behavior also in the second operating condition (Fig. 17).
Neural networks have also been used to realize a nonlinear flux observer that estimates
the flux from indirect measurements [8].
To estimate the flux
the input vector to be presented to the neural network is:
265
Figure 16: Neural model validation for the output ids with 6 hidden neurons and autonomous evolution:
(a) white noise input, (b) nonzero initial conditions and zero input
(continuous line: system output; dashed line: model output).
Figure 17: Validation for the neural model with 3 hidden neuron in autonomous evolution
(continuous line: system output; dashed line: model output)
266
20
40
Vce
60
80
Figure 19: The measured and the Gummel-Poon simulated output current for power BJTs.
267
Two approaches were considered to obtain a model for Icomp- polynomial and neural
models. Although having several limitations, polynomial models can be implemented in the
SPICE simulator in a straightforward way. Neural models may have higher flexibility and
accuracy, although may have higher computational complexity; experiments have shown
good results by using multilayer perceptrons with one 5-unit hidden layer. In Fig. 21 the
measured data and the different models are shown: the advantage of using the compensated
model (i.e., the compensating controlled current generator) is relevant.
The importance of using more accurate models becomes even more evident when they
are used to simulate complex devices that include the modeled component. A first example
is the simulation of a Darlington transistor: the output characteristics of this device obtained
with the various BIT models are shown in Fig. 22.
Another -much more complex- example that includes the power transistor model is the
industrial electronic ignition device developed in the STMicroelectronics laboratories. The
measured output voltage applied to the inductor is shown in Fig. 23a: the much higher
accuracy of the neural-enhanced model with respect to the classical Gummel-Poon model
can be observed in Fig. 23b.
268
Figure 22: Output characteristics of an industrial Darlington transistor based on the various BJT models.
Figure 23: The voltage applied to the output inductor in the STMicroelectronics electronic ignition circuit:
(a) the measured voltage, and (b) the simulation results by using the classical SPICE model
and the neural-enhanced model for the power transistors.
269
The wire temperature influences the wire resistivity (e.g., resistivity generally increases
with temperature in electrical conductors).
When a thermal equilibrium condition is reached, the flow velocity can be related to a
resistance measurement. At the equilibrium condition, the energy balance gives:
I2Rw=hA(Tw-Tf)
(43)
where 7 is the electrical current in the wire, Rw is the wire resistance, Tw is the wire
temperature, Tf is the fluid temperature, h is the heat transfer coefficient of the wire film,
and A is the heat transfer surface. The h coefficient is given by the King's law:
h=C0+C
(44)
where C0 and C1 are suited coefficients, and v is the flow velocity. In Fig. 24 experimental
measurements from the hot-wire sensor are shown for constant flow temperature: the
voltage is proportional to the resistance since a constant current is driven into the sensor.
It must be noted as the reading increases with the flow velocity when the sensor
temperature lowers, this is due to the fact that a semiconductor resistor has been used here
as hot-wire sensor.
5.5
4.5
3.5
2.5
9
10
12
14
16
18
2(
A resistance measurement allows therefore for estimating the flow velocity. However,
the wire resistance depends also on the flow temperature (see Eq. 43). Compensation of this
interfering effect is needed. The scheme of the measurement system is shown in Fig. 25:
two Negative Temperature Coefficient (NTC) resistive sensors (thermistors) are used. The
main sensor NTC2 is located to sense the fluid velocity: its output thus depends both on the
fluid velocity and the fluid temperature. The other resistive sensor NTC1 is located in a
"static" environment so that it is sensible only to the fluid temperature and does not
experience convective heat loss.
NTC
Figure 25: Hot-wire flow measurement system with flow temperature compensation.
270
The analog signal condition circuit for this measurement system based on conventional
approaches is shown in Fig. 26. The output voltage is:
(45)
where:
R4R10R11
K0=
The term K3Vz in Eq. 45 is used to set the working point of the output voltage. The
effectiveness of the flow temperature compensation system is shown in Fig. 27.
-I-12V
R11
24
26
28
30
32
34
38
38
40
Figure 27: Effect of thermal compensation on the output signal of the flow measurement system.
The output voltage of the measurement system is finally shown if Fig. 28. Experimental
data were fitted by using a polynomial model:
Y=aX2+bX + c
(46)
whose coefficients were estimated to be a = -0.0053 Vs2/m\ b = 0.245 V, and c = 0.102 V.
Instead of using sophisticated analog signal conditioning circuits, soft computing can be
exploited to create a soft sensor that reads directly the primary sensor outputs and computes
the output voltage as a function only of the flow velocity, without any dependence from the
flow temperature.
The input signals are the voltages across the two thermistors NTC1 and NTC2. Sixteen
fuzzy sets were associated to each of the input voltages. The global set of fuzzy rules and,
hence, the number of output fuzzy sets has been determined through the fuzzy identification
method summarized in section 12.2.3; the system consists of 121 rules.
In Fig. 29 the output of the soft sensor is compared to the readings of a reference flow
meter. Although the fuzzy measurement system still makes use of the hot-wire approach,
the required components are simple and no sophisticated analog electronics is needed for
signal conditioning.
Flow velocity
[m/s]
50
100
150
200 250
Sample index
Figure 29: Estimation of the flow velocity by using the soft sensor based on fuzzy systems (uneven line),
compared to the output of a reference flow meter.
272
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
273
Chapter 12
Neural Networks
for Measurement and Instrumentation
in Virtual Environments
Emil M. PETRIU
School of Information Technology and Engineering, University of Ottawa
800 King Edward, Ottawa, Ontario, Canada, K1N6N5
Abstract. Neural Networks (NNs), which are able to learn nonlinear behaviors from
a limited set of measurement data, can provide efficient modeling solutions for
many virtual reality applications. Due to their continuous memory behavior, NNs
are able to provide instantaneously an estimation of the output value for input values
that were not part of the initial training set. Hardware NNs consisting of a collection
of simple neuron circuits provide the massive computational parallelism allowing
for higher speed real-time models. A virtual prototyping environment for Electronic
Design Automation (EDA) and a NN model for the 3D electromagnetic field are
discussed in a representative case study.
12.1. Introduction
Virtual Reality (VR) is a computer based mirror of the physically reality. Synthetic and
sensor-based, computer representations of 3D objects, sounds and other physical reality
manifestations are integrated in a multi-media Virtual Environment (VE), or virtual world,
residing inside the computer. Virtual environments are dynamic representations where
objects and phenomena are animated/programmed by scripts, by simulations of the laws of
physics, or driven interactively directly by human operators and other real world objects
and phenomena, Fig. 1.
The original VR concept has evolved finding practical applications in a variety of
domains such as the industrial design, multimedia communications, telerobotics, medicine,
and entertainment.
Distributed Virtual Environments (DVEs) run on several computers connected over a
network allowing people to interact and collaborate in real time, sharing the same virtual
worlds. Collaborative DVEs require a broad range of networking, database, graphics, world
modeling, real-time processing and user interface capabilities, [1].
Virtualized Reality Environment (VRE), [2], is a generalization of the essentially
synthetic VE concept. While still being a computer based world model, the VRE is a
conformal representation of the mirrored real world based on sensor information about the
real world objects and phenomena. Augmented Reality (AR) allows humans to combine
their intrinsic reactive-behavior with higher-order world model representations of the
immersive VRE systems. A Human-Computer Interface (HCI) should be able to couple the
human operator and the VRE as transparently as possible. VRE allow for no-penalty
training of the personnel in a variety industrial, transportation, military, and medical
applications.
274
Animation Script
Motion Tracking
Object Recognition
Virtud_World/Red_World Interfaces
VIRTUAL WORLD
Human
Computer
Interfaces
REAL WORLD
There are many applications such as remote sensing and telerobotics for hazardous
environments requiring complex monitoring and intervention, which cannot be fully
automated. A proper control of these operations cannot be accomplished without some AR
telepresence capability allowing the human operator to experience the feeling that he/she is
virtually immersed in the working environment. In such cases, human operators and
intelligent sensing and actuator systems are working together as symbionts, Fig. 2, each
contributing the best of their specific abilities, [3,4].
VR methods are also successfully used in the concurrent engineering design. The
traditional approach to the product development is based on a two-step process consisting
of a Computer Aided Design (CAD) phase followed by a physical prototype-testing phase.
The limitations of this approach are getting worse as the design paradigm shifts from a
sequential domain-by-domain optimization to a multi-domain concurrent design exercise.
VR methods allow simulating the behavior of complex systems for a wide variety of initial
conditions, excitations and systems configurations - often in a much shorter time than
would be required to physically build and test a prototype experimentally. Virtual
Prototyping Environment (VPE) design methods could be used to conduct interactive whatif experiments on a multi-domain virtual workbench. This results in shorter product
development process than the classical approach, which requires for a series of physical
prototypes to be built and tested.
12.2. Modeling natural objects, processes, and behaviors for real-time virtual
environment applications
VREs and VPEs depend on the ability to develop and handle conformable (i.e., very close
to the reality) models of the real world objects and phenomena. The quality and the degree
of the approximation of these models can be determined only by validation against
experimental measurements. The convenience of a model is determined by its ability to
allow for extensive parametric studies, in which independent model parameters can be
modified over a specified range in order to gain a global understanding of the response.
Advanced computation techniques are needed to reduce the execution time of the
models used in interactive VPE applications when analysis is coupled with optimization,
which may require hundreds of iterations.
Model development problems are compounded by the fact that the physical systems
often manifest behaviors that cannot be completely modeled by well-defined analytic
techniques. Non-analytical representations obtained by experimental measurements have to
be used to complete the description of these systems.
Most of the object models used in virtual environments are discrete. The objects are
represented by a finite set of 3D sample points, or by a finite set of parametric curves,
stored as Look Up Tables (LUTs). The fidelity of these discrete models is proportional
with the cardinal of the finite set of samples or parametric curves. The size of the
corresponding LUTs is not a matter of concern thanks to the relatively low cost of today's
RAM circuits. However, the main drawback of these discrete models is the need for a
supplementary time to calculate by interpolation the parameters of each point that is not a
sample point. This will increase the response time of the models, which in turn will affect
the real-time performance of the interactive virtual environment.
Higher efficiency models could be implemented using NNs that can learn nonlinear
behaviors from a limited set of measurement data, [5,6]. Despite the fact that the training
set is finite, the resulting NN model has a continuous behavior similar to that of an analog
computer model. An analog computer solves the linear or nonlinear differential and/or
integral equations representing mathematical model of a given physical process. The
coefficients of these equations must be exactly known as they are used to program the
coefficient-potentiometers of the analog computer's Op Amps. The analog computer
doesn't follow a sequential computation procedure. All its computing elements perform
simultaneously and continuously. Because of the difficulties inherent in analog
differentiation, the equation is rearranged so that it can be solved by integration rather than
differentiation, [7]. A Neural Network does not require a prior mathematical model. A
learning algorithm is used to adjust, sequentially by trail and error during the learning
phase, the synaptic-weights of the neurons. Like the analog computer, the NN does not
276
(1)
The estimation accuracy of the recovered value for V depends on the quantization
resolution A, the finite number of samples that are averaged, and on the statistical properties
of the dither R.
Because of the computational and functional similarity of a neuron and a correlator, we
found useful to consider the relative speed performance figures for correlators with
different quantization levels given in Table 1, [17].
For instance, a basic 2-level (1-bit) random-pulse correlator will be 72.23 times slower
than an ideal analog correlator calculating with the same accuracy the correlation function
277
of two statistically independent Gaussian noise signals with amplitudes restricted within
3a. A 3-level (2-bit) correlator will be 5.75 times, and a 4-level (2-bit) correlator will be
2.75 times, slower than the analog correlator.
V o
VRP
A/2
A/2
p.d.f.
of VR
k+1 i)
P-A
>
k-1
X
-*-*
k- A
(k+0.5) A
V=(k-p>A
Figure 3: Multi-bit analog/random-data conversion.
Table 1: Relative speed performance for correlators with different quantization levels.
Quantization levels
2
3
4
analog
278
Based on these relative performance figures we have opted for a NN architecture using
a 3-level generalized random-data representation, produced by a dithered 2-bit dead-zone
quantizer. This gives, in our opinion, a good compromise between the processing speed and
the circuit complexity, [1820].
Random-data/analog conversion allows to estimate the deterministic component V of
the random-data sequence as an average V*N over the finite set of N random-data
{VRPi | i=l,2,...N}. This can be done using a moving average algorithm, [21,22]:
1
N-l
(2)
While the classical averaging requires the addition of N data, this iterative algorithm
requires only an addition and a subtraction. The price for this simplification is the need for
a shift register storing the whole set of the most recent N random data. Fig. 4 shows the
mean square error of V*N, calculated over 256 samples, as function of the size of the
moving average window in the case of the 1-bit and respectively 2-bit quantization.
0.18
0.16
0.14
5 0.12
<5
2
<o
cr
CO
0.1
0.08
0.06
0.04
0.02
10
20
30
40
50
Moving average window size
60
70
Figure 4: Mean square errors of the moving average algorithm function of the size of the window.
x2dit.is
x2RQ.
3! _ 2
4
_r
dZ.
is
dH.
is
dL.
is
MAVx2RQ.
32
266
500
is
Figure 5: Simulations illustrating the analog/random-data and random-data/digital conversions.
One of the most attractive features of the random-data representation is that simple
logical operations with individual pulses allow arithmetic operations with the analog
variable represented by their respective random-pulse sequences to be carried out, [15].
This feature is still present in the case of low bit random-data representations.
The arithmetic addition of m signals {xi | i=l,2,...,m}represented by their b-bit randomdata {Xi | i=l,2,...,m} is be carried out, as shown in Fig. 6, by time multiplexing the
randomly decimated incoming random-data streams. The decimation is controlled by
uniformly distributed random signals {Si | i=l,2,...,m} with p(Si)= 1/m. This random
sampling removes unwanted correlations between sequences with similar patterns, [10].
The random-data output sequence Z = (X1+...+Xm)/m represents the resulting sum signal
Z = X| +...+ Xm
1-
s. - s m
1
I
x_>*
Figure 6: Circuit performing the arithmetic addition of m signals represented
by the b-bit random-data streams X1, X2,..., Xm
280
\V Y
-1
00
01
10
0
00
0
00
0
00
00
01
0
00
-1
01
10
-1
10
0
00
-1
10
01
Fig. 7 shows the resulting logic circuit for this 3-level 2-bit random data multiplier.
X,MSB
'LSB
LSB
MSB
"MSB
n
30x1
30x1
zF
30x30
30
Figure 9: Auto-associative memory NN architecture.
Fig. 10 shows as an example three training patterns, which represent the digits {0,1,2}
displayed as a 6x5 grid. Each white square is represented by a "-1", and each black square
is represented by a "1". To create the input vectors, we scan each 6x5 grid one column at a
time. The weight matrix in this case is W = P1P^ + P2P? + P3P/.
282
In addition to recognizing all the patterns of the initial training set, the auto-associative
NN is also able to deal with up to 30% noise-corrupted patterns as illustrated in Fig. 11.
283
284
Electronic components are placed on a PCB where they are interconnected according to
functional CAD specifications and to design constraints taking in consideration the EM
interference between the PCB layout components. However, this design phase does not take
in consideration the interference due to the EM and thermal fields radiated by the integrated
circuits and other electronic components. These problems are identified and ironed out
during the prototyping phase, which may take more what-if iterations until an acceptable
circuit placement and connection routing solution is found. Traditionally this phase
involves building and testing a series of physical prototypes, which may take considerable
time.
Such a multi-domain virtual workbench allows conducting more expediently, in a
concurrent engineering manner, what-if circuit-placement experiments. The VPE is able to
detect the collisions between the safety-envelope of the circuit currently manipulated by the
manipulator dragger and the safety-envelopes of other objects in the scene. When a
collision is detected, the manipulated circuit returns to its last position before the collision.
Virtual prototyping allows a designer to test the prototype's behavior under a wide
variety of initial conditions, excitations and systems configurations. This results in shorter
product development process than the classical approach, which requires for a series of
physical prototypes to be built and tested, Fig. 14.
Electronics
Virtual Prototyping
Floorplan & partition
Trade-off analysis
Design Optimization
Placement
Routing
Analysis
1 Prototype
Acceptable Design
C> (10-14 weeks)
Figure 14: Product design cycles for the traditional and respectively virtual prototyping.
A key requirement for any VPE is the development of conformable models for all the
physical objects and phenomena involved in that experiment. Neural networks, which can
incorporate both analytical representations and descriptions captured by experimental
measurements, provide convenient real-time models for the EM fields radiated by a variety
of electronic components.
285
286
Figure 16: The training data are obtained as analytically by calculating far-field values
from near-field data using the finite element method.
Figure 17: NN model of the 3D EM field radiated by the dielectric-ring resonator antenna.
287
(4)
in a homogeneous volume V bounded on one side by a surface where the magnetic field
values of H are known through measurements and on the other side by the ground plane.
An explicit solution allowing to evaluate the magnetic field H anywhere in the volume
V from its field values and its derivatives on a surface S1 as proposed in [27]:
(5)
where S1 is the closed surface on which measurements are made, n is the normal to S1, and
G(r,r') is the free space Green's function.
This algorithm is independent of the type of radiation. While it shares some sources of
error with other transform algorithms, the integral transform employed here is more
immune to aliasing errors than the FFT-based algorithms. Another advantage over
conventional FFT transforms is that the far-field results are available everywhere and not
only at discrete points.
The EM field measurement system, [28], is shown in Fig. 18. It consists of a turning
table with a highly conducting grounded surface on which the DUT is resting. The EM field
probe can be positioned anywhere on a 90 arc of circle above the turning table.
A special interface was developed for the control of the probe positioning and the
collection of the measurement data via a spectrum analyzer. The turning table and the
probe can be positioned as desired by steering them with position-servoed cables driven by
motors placed outside an anechoic enclosure. The probe positioning system and the
steering cables are made out of non-magnetic and non-conductive material in order to
minimize disturbance of the DUT's fields. EM field measurements are taken on both
hemispherical surfaces, providing data for the interpolative calculation of the derivative's
288
variation on the surface S1. The surfaces are closed with their symmetric image halves.
This is possible due to the presence of the ground plane.
The actual angular positions of the table and that of the probe are measured using a
video camera placed outside the enclosure. The azimuth angle j is recovered by encoding
the periphery of the turning table with the terms of a 63-bit pseudorandom binary sequence,
[29]. This arrangement allows to completely identify the 3D position parameters of the EM
probe while it scans the NF around the DUT.
12.5. Conclusions
Virtual environment technology has found applications in a variety of domains such as the
industrial design, multimedia communications, telerobotics, medicine, and entertainment.
Virtualized Reality environments, which provide more conformal representations of
the mirrored physical world objects and phenomena, are valuable experimental tools for
many industrial applications. Virtual environment efficiency depends on the ability to
develop and handle conformable models of the physical objects and phenomena. As real
world objects and phenomena more often than not manifest behaviors that cannot be
completely modeled by analytic techniques, there is a need for non-analytical models
driven by experimental measurements. Neural networks, which are able to learn nonlinear
behaviors from a limited set of measurement data, can provide efficient modeling solutions
for many virtualized reality applications.
Due to their continuous, analog-like, behavior, NNs are able to provide instantaneously
an estimation of the output value for input values that were not part of the initial training
set. Hardware NNs consisting of a collection of simple neuron circuits provide the massive
computational parallelism allowing for even higher speed real-time models.
289
R.C. Waters and J.W. Barrus, "The Rise of Shared Virtual Environments," IEEE Spectrum, Vol.34,
No. 3, pp. 18-25, March 1997.
T. Kanade, Virtualized Reality, http://www.cs.cmu.edu/~virtualized-reality/. Robotics Institute,
Carnegie Mellon University, Pittsburgh, PA, USA.
R.W. Picard, "Human-Computer Coupling," Proc. IEEE, Vol.86, No.8, pp. 1803-1807, Aug. 1998.
E.M. Petriu and T.E. Whalen, "Computer-Controlled Human Operators," IEEE Instrum. Meas. Mag.,
Vol. 5, No. 1, pp. 35 -38, 2002.
C. Alippi and V. Piuri, "Neural Methodology for Prediction and Identification of Non-linear Dynamic
Systems," in Instrumentation and Measurement Technology and Applications, (E.M. Petriu, Ed.), pp.
477-485, IEEE Technology Update Series, 1998.
C. Citterio, A. Pelagotti, V. Piuri, and L. Roca, "Function Approximation - A Fast-Convergence
Neural Approach Based on Spectral Analysis, IEEE Tr. Neural Networks, Vol. 10, No. 4, pp. 725-740,
July 1999.
A.S. Jackson, Analog Computation, McGraw-Hill Book Co., 1960.
J. von Neuman, "Probabilistic logics and the synthesis of reliable organisms from unreliable
components," in Automata Studies, (C.E. Shannon, Ed.), Princeton, NJ, Princeton University press,
1956
B.P.T. Veltman and H. Kwakernaak, "Theorie und Technik der Polaritatkorrelation fur die
dynamische Analyse niederfrequenter Signale und Systeme," Regelungstechnik, vol. 9, pp. 357-364,
Sept. 1961.
B.R. Gaines, "Stochastic computer thrives on noise," Electronics, pp.7279, July 1967.
"SEMElectronic Correlator," NORMA Messtechnik Tech. Doc., PM 4707E.
A. Hamilton, A.F. Murray, D.J. Baxter, S. Churcher, H.M. Reekie, and L. Tarasenko, "Integrated pulse
stream neural networks: results, issues, and pointers," IEEE Trans. Neural Networks, vol. 3, no. 3, pp.
385-393, May 1992.
M. van Daalen, T. Kosel, P. Jeavons, and J. Shawe-Taylor, "Emergent activation functions from a
stochastic bit-stream neuron," Electron Lett., vol. 30, no. 4, pp. 331-333, Feb. 1994.
E. Petriu, K. Watanabe, T. Yeap, "Applications of Random-Pulse Machine Concept to Neural Network
Design," IEEE Trans. Instrum. Meas., Vol. 45, No.2, pp.665-669, 1996.
290
E.M.
[15] S.T. Ribeiro, "Random-pulse machines," IEEE Trans. Electron. Comp., vol. EC16, no. 3, pp. 261
276, June 1967.
[16] F. Castanie, "Signal processing by random reference quantizing," Signal Processing, vol. 1, no. 1, pp.
27-43, 1979.
[17] K-.Y. Chang and D. Moore, "Modified digital correlator and its estimation errors," IEEE Trans. Inf.
Theory, vol. IT16, no. 6, pp. 699-706, 1970.
[18] E. Petriu, "Contributions to the improvement of correlator performance," Dr. Eng. Thesis, Polytechnic
Institute of Timisoara, Romania, (in Romanian), 1978.
[19] E. Pop, EM. Petriu, "Influence of Reference Domain Instability Upon the Precision of Random
Reference Quantizer with Uniformly Distributed Auxiliary Source," Signal Processing (EURASIP),
North Holland, Vol. 5, pp.8796,1983.
[20] L. Zhao, "Random Pulse Artificial Neural Network Architecture," M.A.Sc. Thesis, OCIECE,
University of Ottawa, Canada, 1998.
[21] A.J. Miller, A.W. Brown, and P. Mars, "Moving-average output interface for digital stochastic
computers," Electron Lett., vol. 10, no. 20, pp. 419420, Oct. 1974.
[22] A.J. Miller and P. Mars, "Optimal estimation of digital stochastic sequences," Int. J. Syst. Sci., vol. 8,
no. 6, pp. 683-696, 1977.
[23] E.M. Petriu, M. Cordea, and D.C. Petriu, "Virtual Prototyping Tools for Electronic Design
Automation," IEEE Instrum. Meas. Mag., Vol. 2, No. 2, pp. 28-31, 1999.
[24] E.M. Petriu, M. Cordea, D.C. Petriu, Lou McNamee, "Modelling Issues in Virtual Prototyping
Environments," Proc. VIMS'99, IEEE Workshop on Virtual and Intelligent Measurement Systems, pp.
1-5, Venice, Italy, May 1999.
[25] I. Ratner, H.O. Ali, and E. Petriu, "Neural Network Simulation of a Dielectric Ring Resonator
Antenna," 7. Systems Architecture, vol. 44, pp. 569581, 1998.
[26] R. Laroussi and G.I. Costache, "Far-Field Predictions from Near-Field Measurements," IEEE Tr.
Electromagnetic Compatibility, Vol. 36, No.3, pp. 189-195, 1994.
[27] A.J. Poggio, and E.K. Miller, "Integral equation solutions of three dimensional scattering problems", in
Computer Techniques for Electro-magnetics, Mittra R., ed., Pergamon Press, Oxford, 1973.
[28] A. Roczniak, E. Petriu, and G.I. Costache, "3-D Electromagnetic Field Modeling Based on Near Field
Measurements," Proc. IMTC/96, IEEE Instrum. Meas. Technol. Conf., pp. 11241127, Brussels,
Belgium, 1996.
[29] E. Petriu, W.S. McMath, S.K. Yeung, N. Trif, and T. Bieseman "Two-Dimensional Position Recovery
for a Free-Ranging Automated Guided Vehicle," IEEE Trans. Instrum. Meas., Vol. 42, No. 3, pp. 701706, 1993.
Chapter 13
Neural Networks in the Medical Field
Marco PARVIS, Alberto VALLAN
Dipartimento di Elettronica, Politecnico di Torino
Corso Duca degli Abruzzi 24, 10129 Torino, Italy
Abstract. This chapter, after an overview of the most important applications where
the neural networks play an important role in the medical diagnosis, discusses a
possible approach which can be used to tackle with both the uncertainty presence
and the reduced number of available training examples. A set of examples drawn from
the medical field is then presented.
13.1. Introduction
This chapter is basically divided into two main sections which are related to the role of
neural networks in the medical field and to the prediction of the output uncertainty of medical
instruments that embed neural networks.
13.2. Role of neural networks in the medical field
Purpose of this section is to investigate where and how neural networks are used in the
medical field. The section in divided into three subsections devoted to the instrumentation
for diagnosis purposes, to decision making instruments, and to the available databases, which
represent an invaluable tool for the validation of algorithms employed in the medical field.
13.2.1 Neural networks in medical instrumentation for diagnosis purposes
Although several different kind of instruments have been developed, it is usually possible
to identify within each instrument a common structure similar to the one shown in fig. 1.
The quantity we are interested in, the measurand, is a physical, chemical or a biological
quantity and represents the input of the instrument. The instrument converts the input
quantity into a numerical value that can be employed either by the physicians, in order to
perform a diagnosis, or by other instruments, in order to perform a more complex processing,
for example to extract more meaningful features or even, as it will be described later, to
automatically perform the diagnosis.
The figure shows several blocks, which are required for a correct instrument operation,
but three of them play a fundamental role as they are on the input/output path: the sensors,
the signal conditioning and conversion and the signal processing.
Neural networks are rarely employed both in the sensor and in the AD Conversion
sections, since these sections are inherently analogue and realized with conventional devices
[1], while it is not uncommon to find neural networks in the signal processing section.
292
A noticeably exception to this scheme is represented by the so called smart sensors (see
chapter 4) where neural networks can be used for linearization or for data-fusion purposes,
but such smart devices can be thought as examples of complete though simplified instruments
and will not be described here.
Neural networks can be used in several ways and it is almost impossible to list all the
different uses. Table 1 shows, as an example, some of the most common uses, clustered
for network topology. As one can see, most examples deal with either signal filtering or
classification issues.
The wide use of neural networks for filtering applications can be explained remembering
that most of the signals that are encountered in the medical field have particular
characteristics, which make the filter design not easy:
- the signal to noise ratio (SNR) can be very poor;
- the spectral content greatly depend on the patient and is often time-dependent;
- the useful spectral band is often overlapped with the noise band;
- unwanted signals, such as mains interferences and electromyographyc signals, are
often present;
- sometimes artefact signals are present, that can be correlated with the signal we are
interested in;
293
Furthermore, because of the non-linear behavior of the "human system", the noise is often
non-gaussian and non-additive and this adds other constraints to the filter choice and leads to
situations that can take advantage of the neural network features.
In fact, neural networks can describe non linear phenomena and can be designed to
implement adaptive filters that tune they parameters by means of the network learning
algorithms.
An interesting and clarifying example can be found in [16] where a neural-network-based
QRS detector is described. The identification of the QRS complex (see fig. 5 in the next
section) is often employed as a reference point to extract the ST segment and is also
mandatory for many others ECG analysis [17]. Traditional techniques for QRS identification
employ a band-pass filter in order to improve the signal to noise ratio and then identify the
signal peak. These simple techniques work well only in the presence of moderate noise.
Unfortunately the acquired ECG signals are very low (a few millivolt) and sometimes are
severely corrupted from several others unwanted signals, such as muscle signals, mains
interference (50 Hz or 60 Hz) and electrical noise. Furthermore the QRS spectral content is
time-varying and often signal and disturbance spectra are overlapped, so that the traditional
filtering techniques would risk to corrupt the signal.
In order to overcome these problems, the QRS detector proposed in [16] employs
a non-linear adaptive matched filters. Adaptive filters work well in the presence of
non-stationary signals because they adapt themselves during the signal changes, and the filter
non-linearity are useful when the input signals are generated from a non-linear system, such
as the human body.
The detector structure is shown in Fig. 2(a). The ECG signal is employed as a trigger
signal for a QRS template generator. Both the ECG and the template signals are filtered with
the same adaptive filter, whose structure, shown in fig. 2(b), is the same of a typical predictive
filter [18]. In this case the predictive filter employs the Time-Delay Neural Network shown in
fig. 2(c) that acts as non-linear filter. Aim of the filters is to remove the noise components
which are correlated with the signal component. The filter outputs are matched together
through a matched filter and eventually the signal peak is detected by means of a threshold
checking technique. The network weights are updated by means of a gradient-search based
algorithm.
Figure 2: (a) The QRS detector based on neural networks. (b) The adaptive filter structure. (c)
The Time-Delay Neural Network employed as non-linear filter.
294
Table 2; QRS detection results obtained by means of linear and non-linear algorithms.
Filter type
Failed detection rate with noisy signals
Neural network based
2.3%
adaptive filter
4.4%
Linear adaptive filter
12.5%
Band-pass filter
This system has been tested by using different records of the MIT/BIH database [19] and
its performance has been compared with other 'traditional' filtering methods.
As expected, adaptive non-linear filters work better that linear filters and a noticeable
reduction of erroneous detections is obtained.
Fast applications of filtering techniques are another field where neural networks are
successfully employed. General purpose processors and Digital Signal Processors (DSPs)
are often inadequate when large medical images have to be processed in real-time.
Specific processors, designed to implement neural structures, can be employed for these
time-consuming applications. The so called 'neuro-processors' [15] or 'ZISC' (Zero
Instruction Set Computer) [20] are nowadays available on the market at a cost which is
comparable to a top-class DSP. Such processors are based on a massive parallel architecture
and are able to implement neural-based algorithm faster than DSPs, even thought their
flexibility is still limited.
13.2.2 Neural networks in decision making instruments
Decision making instruments are devices designed to help the physician in the diagnosis
activity. From an engineering point of view, the diagnosis activity can be thought as an
indirect evaluation of a patient disease. The core of the diagnosis system, shown in 3, is
the decision making instrument that 'classifies' the patient disease on the basis of several
patient-related quantities. The figure highlights that such quantities can be obtained in two
different ways: through the measurement of physical, chemical and biological quantities, and
through subjective evaluations gathered by interviewing the patient about his/her history and
symptoms.
Neural networks can be employed in the measurement instruments, as described in the
previous section, and in the decision making instrument.
Two main problems have to be solved in order to develop an automatic diagnosis systems:
the subjective evaluations have to be converted into a numerical format, since neural networks
requires numerical values, and the decision algorithms have to be formalized in order to be
implemented by the instrument software.
The coding of medical data not yet expressed in a numerical format can be a not easy
task because these information are often expressed in a linguistic format. One should note
that neural network based algorithms are able to adapt themselves to the input values so
that a non-optimal choice of the input data encoding almost always affects only the learning
time. However, even though it is impossible to provide universal coding rules, some basic
guidelines can be remembered depending on the input quantity types:
binary information such as sex, smoker/non smoker, are easily coded with a binary
variable; e.g. sex = 1 male, 0 female;
295
Physical, chemical
biological
quantities
Preliminary
diagnosis
Diagnosis
Simptoms and
others subjective
evaluations
Data Vector
orderable categorical data can be coded by means of a single multi-value variable; e.g.
disease evolution = 0 worse, 1 steady, 2 better;
non-orderable categorical data could also be tackled by a multi-value variable, but it
is better to employ one binary variable for each categorical data since this eases the data
interpretation; as an example, if a diagnosis can be related to the presence of either a
symptom A or a symptom B or both, the encoding could conveniently be obtained by
means of two binary variables: symptom A= 1 yes, 0 no; symptom B= 1 yes, 0 no.
Both measured and encoded data compose the data vector that has to be processed by the
decision making instrument according to a structure similar to the one shown in fig. 4.
Data
vector
296
a vector of values that represents the estimation of the risk-indexes associated to the different
diseases.
Eventually the risk-index vector is sent to the decision rules block. Aim of this stage is to
choose, by means of suitable rules, the correct diagnosis on the base of the risk-index values.
Several rules can be employed such as: winner take all, rules based on thresholds and rules
based on boolean operations. One should note that sometimes this stage is missing either
because it is inherently provided by the topology of network employed in the risk-index stage
or because it is not needed or not desired. This happens when either physicians prefer to take
the final decision on the basis of the actual risk index values or when the required output is a
not binary value as in the case of a drug dosage [27].
Table 3 lists some applications of neural network clustered on the base of the network
topology.
Let we end also this section with an example that shows how a neural network based
analyzer can be employed to completely automatize the diagnosis procedure of ischemic
events.
The cardiac muscle produces electrical signals that propagates through the patient body.
These signals can be measured by means of commercial electrocardiographs (ECG) that are
able to simultaneously acquire up to 12 signals. These instruments are able to store either
several short or few longer signals (Holter instruments typically acquire only two signals but
for up to 24 hours).
Fig. 5 shows an ideal ECG trace. The signal level is of few millivolts while the period RR
is related the the patient activity and is of about 1 s at rest.
The analysis of the ECG traces is useful to infer the heart functionality; e.g. the
geometrical characteristics of ST segment (see fig. 5) are strictly related to the presence of
ischemic events and other coronary artery diseases.
Since the analysis of real traces, as those shown in right side of fig. 5, is a long and
time consuming operation, modern instruments [31, 32] are often designed to perform an
automatic classification and interpretation of the ECG signals.
Several automatic algorithms have been investigated, both conventional and neural. The
example reported here describes a neural approach, proposed by Maglaveras and others [5],
that is designed for the interpretation of ECG signals in order to detect ischemic episodes.
The proposed system, shown in fig. 6, has the typical structure of the generic decision making
instrument previously described. The system firstly extracts the features that are related to the
investigated pathology from the input signals. In this case, ischemia appears in the ECG as a
297
Sensitivity Tp+Ffr
89%
84%
Predictivity T/+fP
78%
87%
change in the ST segment morphology. The feature extraction algorithm therefore is designed
to identify the ST and to compare it against a template that has been previously obtained from
the same patient when no ischemic events were present. The difference with respect to the
template is considered to be the feature to be forwarded the risk-index estimation stage. The
input vector for the risk estimation stage is therefore composed of 20 samples representing
the difference between template and actual signal during the ST interval.
Q,
81
ST
;|
;U
i
extraction
samples
kO k
?'
QRS
detector
_J
Decision
_^
rules
~* (quantization
threshold 0.5)
Template 1
ST samples i
1
The network in the risk-index stage has to manage time series so that a Time-Delay Neural
Network, see fig. 2(c), is employed. The network core is a tree layer MLP with 20 inputs, 2
outputs and 10 sigmoidal neurons in the hidden layer. A back propagation algorithm with an
adaptive learning rate is employed during the learning phase and the network is trained to
recognize four classes: normal ST, depressed ST, elevated ST and unclassifiable ST. Training
and test sets were composed of 120 patterns (50% normal, 25% had ST depression and 25%
ST elevation) and were extracted from the European ST-T database..
Table 4 shows a comparison of the network performance with respect to other systems. It
is easy to see that the network obtains results similar to the other solutions both in sensitivity
and predictivity, with the advantage that it does not require the physician to identify the
input/output model. In addition, it is faster than other methods, so it is suitable to be employed
in real-time detector systems.
This type of approach is effective, but the system does not provide any explanations of
the reasons of its result, so physicians lack the possibility to understand the reasons of the
298
diagnosis. Others approaches have been developed in order to provide a medical description
of the reasons of the network result. As an example, a mixed fuzzy/neural approach has
been employed to provide a linguistic description of the diagnosis [34]. Non-neural decision
techniques can be also employed in order to obtain an explanation of diagnosis reasons. The
rule based techniques [35] are self-explanatory, but require an accurate description of the
'human model', description that is not always available. Some hybrid techniques, which
mix neural and non-neural approaches, can be also employed. As an example, ProstAsure
[36], which is an early detector of prostate cancer based on neural network, employs some
rules which are based on the assessed medical knowledge in order to improve the learning
efficiency [37].
13.2.3 Medical data-sets
The training phase is one of the most critical steps during the development of any neural
network based system. In this phase the network designers have to choose a suitable learning
algorithm and have to provide a reliable set of examples, the so called training set. Once the
network has been trained a second set of examples, the test set is required in order to test the
network performance.
Sometimes, in order to avoid the overfitting phenomena, special techniques, such as the
early stopping algorithm, are employed during the learning phase. In this case the training of
the network is performed by means of the training set, but, periodically, the learning process
is stopped and a cost function (e.g. the mean squared error) is computed on the validation
set. According to the early stopping algorithm the network is trained until the cost function
reaches the minimum value [38].
These techniques require a third set of example to be employed. A meaningful training,
validation, and test of a network thus require a large amount of examples. Unfortunately real
examples are not easy to obtain in the medical field: each record in a data-set requires a patient
and data have to be manually processed by an expert physician. Furthermore, a pool of experts
is often required to analyze the examples in order to get the correct diagnosis. Fortunately
many medical problems are world-wide diffused so some researchers have decided to share
own data with others researchers. Several collections of medical data are nowadays available
that cover the most diffuse medical problems. Here is a list of some data-sets that are freely
downloadable from the internet.
Probenl [39] is an archive that contains 15 benchmark problems and set of
benchmarking rules which can be used to compare algorithms. The
benchmarks cover medical and non-medical fields and are related to
classification and function approximation problems. Probenl provides data-set
already subdivided into three classes: training set, validation set and test set.
The benchmarks cover several medical topics: diagnosis of breast cancer,
diagnosis of diabetes, detection of intron/exon boundaries in DNA sequences,
prediction of heart disease and diagnosis of thyroid hyper or hypo-function.
Probenl also provides specific benchmarks to be used as reference values
when testing the performance of different networks. For each reference result
the following information are provided: input/output values, normalization
rules, nominal attribute coding, missing attribute values, training algorithm and
termination criteria, cost (or error) function, network topology, classification
method, activation functions and weight initialization.
299
PhysioBank [40] is a collection of database sites and software. The archive contains
contains multi-parameter data-set, ECGs, EEGs, images and other medical
data. An important database that is accessible from PhysioBank is the
European ST-T database that contains 90 records composed of two-channel
ECG signals digitized at 250 Hz. Each record lasts 2 hours and contains at
least one ischemic episode. The episodes are marked by expert cardiologists.
Wisconsin Datasets [41] contains multi-parametric and image data-set useful both for
the diagnosis and the prognostic of breast cancer.
The University of South Florida database [42] contains more than two thousands
mammography images for breast cancer diagnosis that are grouped, depending
on the medical diagnosis, into 43 classes. In addition it contains software,
which can be used to extract and process images.
13.3. Prediction of the output uncertainty of a neural network
When a neural network is directly involved in the measurement process at least two questions
arise. The first is related to the way the uncertainties of the input values propagate to the
outputs, i.e. how one can estimate the uncertainty that affects the network outputs as a
consequence of the uncertainties which affect the network inputs.
The second question is related to the network training phase. Neural networks behave
according to what they have learned from the examples used during the training phase, but
the information about the uncertainty that affects such examples is usually not made available
to the network. This can lead to either an incorrect training or, at least, a non-optimal training.
This section discusses a possible approach to tackle both these problems. The approach
is suitable for most "memory-less" network topologies that use a supervised training.
13.3.1 Neural networks and measurements
Neural networks are nowadays widely employed in the medical field as well as in industrial
and in research environments. Their use is continuously spreading and today applications are
available that embed different flavors of neural networks in complex measurement systems
even though the network role in the measurement affair is not always clear.
As an example of network use, we can focus on the medical field and recall three main
uses that appear to have rather different properties. Neural networks can be employed to:
- discriminate among different pathologies
- determine medical parameters which show the evidence of a pathology
- select the most appropriate therapy
Are all these uses of neural networks really measurements?
According to the classical definition, the measurement of a quantity is the determination
of a number, which represents the ratio between the measured quantity value and a recognized
standard. In a trivial example: we measure the length of a rod by comparing it against a ruler
that has been calibrated against the accepted standard that materializes the length of one
meter.
300
Neural networks behave rather differently with respect to this paradigm so that an obvious
question arises: how can a neural network produce a measurement?
The answer can be found by recalling that most measurements are indirect, i.e. they are
obtained by using a suitable model to combine several direct measurements, i.e. by combining
measurements that are obtained by means of instruments.
To fix the ideas let us consider the velocity measurement. The mean velocity of a vehicle,
which is defined as the ratio of the distance s covered by the vehicle and the time t it takes to
cover such a distance, can be obtained by measuring the time and the distance and combining
them by means of a suitable model. The model is of course:
(1)
and we use it to obtain the measurement of the velocity mv by means of the measurements of
space ms and time mt:
mv =
(2)
mt
From the input-output point of view, a static, memory-less, neural network, i.e. a network
where the outputs depend on the actual inputs but not on the input history, can be depicted as
in fig. 7.
According to this scheme, a generic multi-output neural network is a device which
receives several quantities v1, ,vn as inputs and computes the outputs 0l, ,0m by
means of a set of defined equations f1, , fm:
Oi = f i ( v 1 , ...vn)
= l,---,m
(3)
where fi is the relationship, tuned during the training phase, that describes the neural network
behavior regarding the ith output.
Therefore the neural network actually combines several inputs, i.e. several measurements,
to determine an output, and can thus be thought as a device that produce indirect
measurements.
However when we perform a conventional indirect measurement we employ a system for
which:
- we a-priori know the combination rule the indirect measurement system has to realize,
i.e. we exactly knows the relationship between the direct measurements we employ
and the indirect measurement we wish to obtain.
- we work referring to an accepted standard, i.e. the indirect measurement we are looking
for has its own standard and, in principle, could be measured directly. The combination
rule we use is actually the one (and usually the only one) that allows us to obtain the
same result we could obtain by directly measuring the required quantity.
301
302
Cause
Effect
Uncertainty on the
input quantities
Model uncertainty
Other uncertainties
of the
(4)
VlO,W20,.-WnO
where Oi is the ith output, Vio...vn0 are the n values that correspond to the actual measured
values so that fi (V I0 , v20, ---vn0) is the nominal value of Oi, and 6vj are the changes of the
inputs respect to the measured values.
303
By using this equation it is therefore possible to preview any output change 8vj
corresponding to any combination of input changes:
5vj
(5)
j1
The output change is therefore obtained as a weighted summation of the input changes
where the weights are represented by the partial derivatives and are often referred to as
sensitivity factors or sensitivity coefficients.
Deterministic model The deterministic point of view tries to determine the maximum
output change that corresponds to the worse combination of input changes. In mathematical
terms this corresponds to a summation of absolute values:
(6)
3=1
Statistical model If the statistical model is employed as suggested by the ISO guide,
the uncertainties affecting the input quantities are managed as random variables and are
characterized in terms of their standard deviation and their combination follows the well
known rules of the random processes:
E
E
j=l k=j+l
where u(oi) is the expected output standard deviation, u(vj) are the standard deviations of
the input signals, pujuk is the correlation coefficient between the jth input and the kth input.
The subscripts that remember that all the derivatives are computed in the nominal input point
are omitted for clarity.
Of course, in the absence of correlation among the different inputs i.e. when
Pujuk = 0Vj, k, eqn. 7 simply becomes:
One should note that both equations involve input uncertainties and derivatives of the
network function, but there is no need to compute such derivatives in an analytical way and a
numerical approach can be usefully employed.
Before ending this section it could be interesting to investigate how to move from the
deterministic old way of dealing with the uncertainties to the new statistical approach. Of
course a general solution is not available, but a simple conversion can be obtained if one
knows the shape of the input density function p(), or if such a function can be reasonably
hypothesized. In this case it is possible to compute analytically the variance, and thus the
standard deviation, as:
u2(vi)=
r*M
J-S(vi)
p^i^Vi-
(9)
304
Cause
Compute
the
output
uncertainty by combining
the input uncertainties
Model uncertainty
Other uncertainties
Uncertainty of the
input quantities
where (vi) is the maximum expected uncertainty. As an example, if the uniform distribution
is supposed, eqn. 9 leads to:
tifa) = ^
(10)
(11)
where oj is the expected output maximum deviation; Soi is the term that takes the input
effect into account (see eqn. 6); m* is the deviation due to the model error and nm,i is the
deviation connected to the other non model-related effects i.e. to the influence quantities.
The uncertainty combination when the ISO model is employed is more difficult and
questionable. The ISO approach combines the different uncertainty causes in terms of
standard deviation, but both the model error and the other uncertainties exploit a deterministic
305
effect. If the statistical uncertainty model has to be used anyway, the cumulative standard
deviation corresponding to the three uncertainty causes can be conservatively computed as:
uc(oi)2 = u(oi)2 + 8ml + Snml
(12)
where uc(oi) is the expected cumulative standard deviation; u(oi) is the terms that take the
input effect into account (obtained either from eqn. 7 or from eqn. 8); mi is the deviation
due to the model error and 8nmi is the deviation connected to the other effects.
One should note that the value obtained from eqn. 12 corresponds to a distribution that
has a not null mean and thus an interpretation of the uncertainty in terms of probability is
rather difficult.
13.3.5 Taking the uncertainties into account during the training phase
During a supervised training phase some parameters that define the network behavior, such
as weights and biases in a Multi Layer Perceptron (MLP), are modified in order to force the
network to produce the requested outputs.
The training effectiveness depends on the availability of a training set that contains at
least one example of the most important occurrences the network is expected to encounter.
Unfortunately, this constraint is not sufficient in the presence of a non negligible uncertainty
on the input values.
Several examples of different measurements corresponding to the same nominal condition
are required in order to force the network to take the uncertainty presence into account. In the
absence of such examples, the network tends to learn a specific combination of uncertainties
and lacks part of its generalization ability.
Unfortunately, in most practical situations, the training set dimension is limited and often
very few examples that correspond to the same nominal condition are available. In addition,
a direct method of embedding the uncertainty values in the training phase when the input set
is not wide enough is not yet available.
Many approaches have been proposed to reduce the problems connected to a small
training set that is also affected by noise. Some authors implemented constraints on the
weights or similar approaches during the training phase [44, 45]. Other authors tackled the
problem by adding noise to the training set [46].
This last approach can easily be extended to take the uncertainty presence on the inputs
into account.
When using this approach, the training process is carried out employing a modified stream
of inputs, which is obtained by manipulating the original one. The goal is to provide groups
of examples that highlight the uncertainty presence for all the most important situations.
Each group can be obtained by generating several replicas of the original example. Each
replica is obtained by corrupting the original input values with different combinations of the
expected uncertainties.
There is no definite rule to choose the required number of replicas. One possibility would
be that of mapping all the combinations that can be obtained by adding and subtracting the
expected uncertainties to each input. This would produce, for each example, a group of 2N
new examples for a network that has N inputs. The training set therefore could become
unacceptably large if N is greater than three-four. In this case the required number of replicas
can be determined by means of a trial-and-error process where the network is trained by
adding replicas until its behavior does not change significantly.
306
One should note that the proposed method can be extended to the entire training set simply
by concatenating replicas of the entire training set. This approach can be convenient when the
training set has a limited dimension or when the analysis of such a training set, in order to
cluster the examples that refer to similar conditions, is difficult or impossible.
The impact of training with the modified set depends on the network response type and is
remarkable if the network has regions in which a steep output change is required. In such a
case the uncertainty presence can produce dramatic output changes that reflect in an overall
poor behavior, as explained in the two examples of the following section.
13.3.6 Two examples
Two examples have been designed to be easily shown and to highlight the previously
discussed topics i.e. the effect of the conventional and enhanced training, and the computation
of the output uncertainty.
Both examples deal with a network with two inputs and one output so that the input/output
relationship can be shown on a 3D plot. The input and output spaces are confined in the range
[0,1]. The examples use training sets composed of 120 examples based on randomly selected
points (uniform distribution) in the [0, l]x[0,1] square; the points are supposed to be affected
by an uncertainty (uniform distribution) whose maximum amplitude is 0.05 that corresponds
to 5% of the input range.
307
The aspect of the two training sets is shown in fig. 8. The two training sets look quite
strange, but in reality they have been produced by employing a rather simple equation and
the strange aspect is a consequence of the uncertainty presence.
The hypothesized indirect measurements refers to a bidimensional function z = f (x, y)
which is defined as:
1
This function has been chosen for the examples since it has two major advantages. Firstly
it can easily be approximated even though a simple single-neuron MLPs is employed since
the equation is one of the functions that is commonly used to implement the neuron activation.
Secondly the function sharpness can easily be controlled by means of the parameter k.
This permits an easy investigation of the effect of the function steepness on the network
behavior with respect to the input uncertainties.
The first example regards the approximation of a "smooth" function, which is obtained
by setting k = 10, while the second one regards an example of a very "steep" function, which
is obtained by setting k = 1000.
Of course in a real situation we should not know the equation at the time of training
and we should not know the actual aspect of the output surface, but it is interesting to see
what the network should approximate. Fig. 9 shows the two surfaces and highlights the great
difference in the steepness of the two transitions.
The network behavior and the effect of the uncertainty presence on both the training and
the expected output uncertainty is dramatically different for the two examples.
The smooth surface is easy to describe: fig. 10 shows the outputs obtained within the
training set by employing either the conventional or enhanced training.
Enhanced Training
Conventional Training.
1 0
1 0
Figure 10: Results of conventional and enhanced training for the case of k = 10.
This pictures are important since they are the only results available at the time of training
and the quality of the approximation within the training set is often taken as an indicator to
decide how many neurons to employ in the hidden layer and when to stop the training process.
As one can see, the uncertainty presence does not impair the training since the effect of the
uncertainty is to slightly shift the output value. Feeding the network with equivalent values,
as we do during the training with the enhanced training set, does not require a significant
change in the network parameters and we expect the networks generated by the two different
training to behave approximately in the same way.
308
Figure 11: Output surface obtained after conventional and enhanced training in the case of k =
10. The two surfaces look rather similar.
Once the network is trained we can easily observe the output surface and preview the
effect of the input uncertainties.
Fig. 11 shows the aspect of the output surface obtained by training the network either in
the conventional way or with the enhanced approach. It is easy to observe that the two surfaces
are rather similar, as expected, and therefore that the enhanced training has a negligible effect
on the final result.
Figure 12: Sensitivity of network output with respect to the inputs for the case k =10. The
sensitivities are similar, regardless of the used training.
In order to compute the output uncertainty by employing either eqn. 6 or one of the 7 or
8, we need the derivatives with respect to the inputs. Such derivatives can be easily computed
by means of a numerical approach and their aspect is shown in fig. 12.
The test of the network performance during its use can be obtained by feeding the network
with new randomly generated examples. Fig. 13 shows a set of 100 examples, ordered by
the output value to increase the plot readability. The black asterisks represent the expected
(true) values, the gray hollow circles the network output values, and the lines the standard
deviations (i.e. the expected uncertainties) of the network outputs. It is easy to see that
both the conventional and the enhanced training lead to output estimations that contain the
corrected value and, in addition, the two trainings lead to quite similar plots.
The same procedure can be followed when the steep network is used. Fig. 14 is the
counterpart of fig. 10 and shows the output surfaces obtained by training the network with
conventional and enhanced training for k = 1000.
309
Figure 13: Network behavior with new randomly generated data for the case k = 10. The
examples are ordered by value to improve the trace readability.
Figure 14: Output surfaces obtained by training the network with either the conventional or
enhanced approach for the case of k = 1000. The surfaces are rather different in this case.
The two trainings behave quite differently and the first impression is that the conventional
training leads to a better approximation of the training set, while the enhanced training leads
to a result that is less similar to the original one.
In other words, the conventional training leads to a network that try to correctly describe
the examples, while the enhanced training produces a smoother surface that does not
completely fit the examples. The equivalent though different examples of the enhanced
training set tell the network to interpolate in order to find an average value suitable for
producing a reasonable result regardless of the actual uncertainty combination. The enhanced
trained network therefore should be more useful in predicting the correct output (and
uncertainty) during its use even though less suitable to describe the training set. We expect
the two networks to behave quite differently and the output surfaces, fig. 15, confirms this
impression.
Fig. 16 shows the output sensitivity with respect to the input values and highlights how
the conventional training has very high peaks of sensitivity near the transition point. The
enhanced training produces a smoother surface and therefore a lower sensitivity to small
changes of the input values. These different sensitivities turn out in quite different behaviors
during the network use, as highlighted in fig. 17. Here it is easy to see how the enhanced
training produces a lower overall uncertainty near the transition point even though all the
predicted values contain the correct value.
310
Figure 15: Output surface obtained after conventional and enhanced training in the
case of k = 1000.
Figure 16: Sensitivity of network output with respect to the inputs for the case k = 1000.
13.3.7 Summary
At this point it is possible to summarize the topics discussed so fan
- Neural networks can be used in the medical field to produce indirect measurements,
provided that we do not encounter the problem of the non conventional inputs (i.e. of
input quantities for which the uncertainty cannot be clearly stated). We can employ the
neural network in two ways:
- Non-defining networks: the quantity we want is already defined and could be
measured directly, but the network is easier to use and we do not need to write
down the measurement model.
- Defining networks: the quantity we want is not defined in other ways and the
network defines it (once it has been trained). A new training defines a new quantity.
- We must take the uncertainty presence into account, regardless of the type of network
we employ. A prediction of the output uncertainty can easily be obtained, provided
that: the input uncertainties are known and we can compute the derivatives of
the input/output relationship. Both the deterministic and statistical models can be
employed.
- The uncertainty presence should be taken into account at the training level to help the
network to weight the inputs according to their uncertainties.
311
Figure 17: Network behavior with new randomly generate data for the case k = 1000. The
examples are ordered by value to improve the trace readability.
- If the training set is very large, it contains enough information to allow the network
to train correctly and nothing else is required.
- If the training set is small and does not contain enough information about the
uncertainties, we can force the network to take the uncertainty presence into account
by creating an enlarged training set. The enlarged training set can be generated
by creating several replicas of the examples each one corrupted by different
combinations of the input uncertainties.
In addition to these points we can remind the problem of the output uncertainty of
the examples belonging to the training set, which affects the training and can produce an
additional model error and two other subjects that are inherently connected to any neural
network use, but that become important in the medical field due to the limited dimension of
the training sets: the unbalanced training set issue and the oversized network issue.
As far as the first issue is concerned we have to observe that the training algorithm tries to
minimize the cumulative error for all the examples. If the training set is mostly composed of
a specific kind of examples, the network will adjust to describe such examples at the expense
of the others. This is always true, but is especially important in the medical field where small
training sets are used and healthy volunteers are easier to find (and measure...) than severely
impaired patients. More details on this issue can be found in several papers such as [47].
The second issue is more subtle and connected to the network design approach which is
synthesized by the sentence If the network does not describe the training set... increase the
number of neurons! This approach can be dangerous in the medical field again due to the
limited dimension of the training sets.
Let us consider an example based on an MLP with N inputs, M neurons in the hidden
layer and one output. The MLP will have: N x M weights, plus M biases for the connection
between inputs and the hidden layer and M weights plus one bias for the connection between
the hidden layer and the output i.e. (M + 2) x N + 1 parameters to be identified during
the training. As an example we can discuss a relatively small MLP with 4 inputs 5 neurons in
the hidden layer that contains 29 parameters to identify: how many examples do we need to
have a satisfactory identification (and not a net that describes the examples perfectly instead
of approximating the population)?
312
Figure 18: Output surface with conventional training and network performance within test set.
314
Figure 19: Comparison of the surfaces obtained with the conventional and enhanced training.
Two examples of enhanced training with different uncertainties.
flag the patients as doubtful is somehow arbitrary and has to be chosen as a trade off between
the number of tolerated errors and the number of unclassified patients. This issue will be more
extensively discussed in the second example; in this case a patient is classified as high risk if
the network output minus its uncertainty is above 0.5, low risk if the network output plus its
uncertainty is below 0.5, and doubtful otherwise.
By employing this choice the network gives 71 correct predictions (instead of the 81 of the
simple data analysis), 2 wrong predictions (instead of 3) and flags 10 patients as doubtful. One
should note that the reduction in the number of prediction errors is remarkable: since one of
the patient died due to Acute Respiratory Distress Syndrome, which is a not predictable event,
the surgically connected problems reduce from two to one (at the expense of 10 unclassified
patients).
A more complete validation of the network behavior and a complete discussion about
the correct criterion to flag a patient as doubtful would require to test the network with
new medical cases. Unfortunately, the probability that a patient flagged high risk should
encounter severe problems is very high and therefore ethical reasons would suggest avoiding
interventions on these patients (this is the purpose of the network!). It is not likely there
will be other examples of high risk patients who undergo an operation. This means that the
training cannot be substantially improved and the validation will mainly be one way (i.e. low
risk or doubtful patients who are operated and encounter severe problems).
As a final comment to this example we could ask ourself: is all the work with the neural
networks worthwhile in this application? The answer is open: after all we have only two
inputs and their combination is something like a weighted mean; we could obtain similar
prediction results using a conventional statistical approach. In addition, the final patient
classification into the three categories has been made in an arbitrary way: the statistical
approach could have been used instead.
315
Figure 21: Mean and standard deviation of the four parameters within the three groups.
(14)
The discriminant score requires therefore to identify a total of 15 parameters. Eqn. 14 can
be rewritten in matrix form:
316
Figure 22: The neural network approach based on three MLPs plus a competitive layer. On the
right the enhanced version with the guard neuron.
d = Ax c
(15)
where:
d=
dB
dE
0-1E
&3E
0-4E
XTLCO
CA
CB
CE
(16)
The identification of A and c can be obtained by computing the pooled covariance matrix
S of the data, the mean value of each test Xk and the vector q of the frequencies of each
pathology:
A=
(17)
The neural network approach can use a Multi Layer Perceptron (MPL) architecture. We
can conceive at least three possibilities:
a) One single-output triple-level-output MPL.
b) One single-level triple-output MLP.
c) Three single-level, single-output MLPs (each MLP trained to identify one of the three
pathologies)
The solution with three MLPs allows simpler networks to be employed with 2 or 3
neurons in the hidden layer for each network. This in turn reduces dramatically the number
of network parameters of each network and greatly speeds up the training. Each network
is trained to produce an output in the range of 0 to 1 (no identification to complete
identification). The three outputs have to be combined together to obtain the required
classification. The neural network equivalent of the linear discriminant score is the use of
a competitive layer i.e. a winner takes all approach as shown in fig. 22.
In order to compare the performance of the linear discriminant score with respect to the
neural network, the patients were divided into a training set composed of 55 patients (13
asthma, 29 bronchitis, 13 emphysema) and a test set composed of 103 patients (24 asthma,
50 bronchitis, 29 emphysema). The training set was used to compute A and c, for the
discriminant score approach, and to train the neural network.
317
The selection of the patients to be included in the training set was manually performed by
a physician who chose among the patients of the first group. The manual selection is required
because there are several different flavours of each pathology. If we had a very large training
set we would be sure of having all the important aspects included in the training set, but with
a small training set we must be sure to include at least one example of each flavour in the
training set.
Once the MLPs are trained with the enhanced training approach described in the previous
section, the neural system fails only 8 times (15%) within the training and 23 times (22%)
within the test set.
The linear discriminant score system is able to identify 43 patients within the training set
with 12 errors (22%) and 70 patients within the test set with 33 errors (32%).
The combination MLPs+Cl therefore performs better than the linear discriminant
approach even though we still have several errors. This behavior is intrinsically connected
with the CL use: the CL always produces a winner, even though no network is really activated,
therefore it is reasonable that most errors can be avoided by discarding too weak winners. This
behavior can be obtained by employing a modified Competitive Layer as shown in the right
hand side of fig. 22.
Of course the problem becomes the choice of the correct guard level. Using a high guard
level permits one to avoid most of the errors, but at the expense of several unclassified
patients. Using a low guard level reduces the unclassified patients, but also the guard
effectiveness; the choice is a trade-off between the two requirements and can be obtained
by observing the guard effect on the training set as shown in fig. 23.
The guard neuron use allows one to highlight doubtful or unclassified cases avoiding gross
errors, but we still lack a classification reliability indicator. This indicator can be obtained by
employing a further post processing that takes the input uncertainty presence into account.
Firstly the three outputs are combined together to highlight the evidence of one pathology
with respect to the other two thus computing an evidence index ek:
e k = n k II j k ( l n j )
k,j (asthma, bronchitis, emphysema)
where k and j are the pathology indexes which use a modulo-three algebra (i.e if k =
emphysema then k + 1 = asthma ) and nk and nj are the correspondent network outputs.
The uncertainty of the evidence indexes can then be computed as:
emphysema
u2c(ei)=
ij u
(pj )
(19)
j=asthma
where: uc(ei) is the combined standard uncertainty of ith evidence index, i.e. the expected
standard deviation of ith evidence index; u(pj] is the standard uncertainty of the jth clinical
test, and sij are the sensitivity coefficients of the ith evidence index with respect to the jth
clinical test.
The evidence indexes and their uncertainties can eventually be used to define whether the
pathology is clear or doubtful.
Again several criteria can be used to flag a patient as doubtful such as the actual value
of the Evidence Index and its uncertainty or the difference between the two highest indexes.
The criteria selection is arbitrary, though not critical and the previewed performance of each
criteria can be compared by plotting the number of errors vs the number of missed diagnoses
318
within the training set. Fig. 23 shows the results obtained by the simple guard neuron and
two different criteria based on the evidence indexes. It is easy to see that the evidence indexes
behave better than the guard neuron and that the results of the different criteria are rather
similar.
Figure 23: Trade-off between errors and missed diagnoses for the guard neuron and different
criteria on the evidence indexes.
At this time, since all the tunings have been done, it is possible to compare the different
approaches within the test set. Fig. 24 shows the number of errors and missing diagnoses for
the guard-based neural network, with different guard levels, in comparison with the linear
discriminant score. The fig. 25 shows the performance of the evidence index approach with
different thresholds. The two figures show that the evidence index system behaves better than
the others not only in the training set, but in the test set too.
As we did in the previous example we could ask ourself if the neural network introduction
is worthwhile. Actually, the neural network approach seems easier to implement than the
conventional statistical approach and seems to work better. However we employed only linear
statistics, a non-linear statistical approach would probably produce results that are close to
the MLR
The neural network with the guard neuron seems to be much better than the conventional
statistical approach, but we surely could obtain similar results by further manipulating the
triplet of discriminant scores, thus introducing the doubtful class into the statistical approach.
319
Figure 24: Performance comparison of linear discriminant score and MLP+C1+ guard neuron.
Figure 25: Performance comparison of the evidence index method with different thresholds.
References
[1] Medical instrumentation, application and design, Webster editor, Jonh Wiley and Sons Inc., 1995.
[2] Y. Nagasaka A. Iwata and N. Suzumura, "Data compression of the ecg using neural network for digital
Holter monitor" IEEE Engineering in Medicine and Biology Magazine, vol. 9, no. 3, pp. 53-57, Sept.
1990.
[3] Ick-Tae Kang Ju-Won Lee Han-Wook Lee, Jong-Hoe Lee and Gun-Ki Lee, "A study on lung nodule
detection using neural networks" in Proceedings of the IEEE Region 10 Conference, vol. 2, pp. 11501153.
[4] Tzi-Dar Chiueh C.K. Chen and Jyh-Horng Chen, "Active cancellation system of acoustic noise in MR
imaging" IEEE Trans, on Biomedical Engineering, vol. 46, no. 2, pp. 186191, Feb. 1999.
[5] C. Pappas N. Maglaveras, T. Stamkopoulos and M. Gerassimos Strintzis, "An adaptive backpropagation
neural network for real-time ischemia episodes detection: development and performance analysis using
the european ST-T database" IEEE Trans. on Biomedical Engineering, vol. 45, no. 7, pp. 805813, July
1998.
[6] B. Hudgins R. Grieve, P.A. Parker and K. Englehart, "Nonlinear adaptive filtering of stimulus artifact"
IEEE Trans, on Biomedical Engineering, vol. 47, no. 3, pp. 389-395, March 2000.
[7] E. Haselsteiner and G. Pfurtscheller, "Using time-dependent neural networks for EEG classification"
IEEE Trans. on Rehabilitation Engineering, vol. 8, no. 4, pp. 457463, Dec. 2000.
[8] P. Simpson T. Brotherton, T. Pollard and A. DeMaria,
"Classifying tissue and structure in
echocardiograms" IEEE Engineering in Medicine and Biology Magazine, vol. 13, no. 5, pp. 754760,
Nov.-Dec. 1994.
[9] R. Silipo and C. Marchesi, "Neural techniques for ST-T change detection" Computer in Cardiology, pp.
677680,1996.
[10] S. Selvan and R. Srinivasan, "A novel adaptive filtering technique for the processing of abdominal fetal
electrocardiogram using neural network" in Adaptive Systems for Signal Processing, Communications,
and Control Symposium 2000, pp. 289-292.
[11] H.J. Ritter T. W. Nattkemper and W. Schubert, "A neural classifier enabling high-throughput topological
analysis of lymphocytes in tissue sections" IEEE Trans. on Information Technology in Biomedicine, vol.
5, no. 2, pp. 138149, June 2001.
[12] C.I. Christodoulou and C.S. Pattichis, "Unsupervised pattern recognition for the classification of EMG
signals" IEEE Trans, on Biomedical Engineering, vol. 46, no. 2, pp. 169178, Feb. 1999.
320
[13] J.F. Sobh M.R. Risk and J.O. Saul, "Beat detection and classification of ECG using self organizing maps"
in Proceedings of the 19th Annual International Conference of the IEEE, vol. 1. pp. 8991.
[14] The IBM Microelectronics - Essonnes Component Development Laboratory - White cell blood
identification, http://www-5.ibm.com/fr/cdlab/zblcell.html
[15] T. Koder W. Eppler T. Fischer, H. Gemmeke and R. Stotzka, "Neural chip Sand/1 for real time pattern
recognition" IEEE trans, on Nuclear Science, vol. 45, pp. 18191823, Aug. 1998.
[ 16] Y.H. Hu Q. Xue and WJ. Tompkins, "Neural-network-based adaptive matched filtering for QRS detection"
IEEE Trans, on Biomedical Engineering, vol. 39, no. 4, pp. 317-329, April 1992.
[17] H. Nakajima K. Minami and T. Toyoshima, "Real-time discrimination of ventricular tachyarrhythmia with
Fourier-Transform neural network" IEEE Trans. on Biomedical Engineering, vol. 45, no. 2, pp. 179185,
Feb. 1999.
[18] Simon Haykin, Adaptive Filter Theory, Prentice-Hall International Edition, 1991.
[ 19] Massachusetts Inst.Technol., Database Distribution, http://ecg.mit.edu/
[20] Silicon Recognition, http://www.silirec.com/
[21] H. Dickhaus and H. Heinrich, "Identification of high risk patients in cardiology by Wavelet networks" in
Engineering in Medicine and Biology Society, 1996., vol. 3, pp. 923924.
[22] G. Coppini R. Poli S. Cagnoni R. Livi and G. Valli, "A neural network expert system for diagnosing and
treating hypertension" Computer, vol. 24, no. 3, pp. 6471, March 1991.
[23] Hong Zhang Zhen Zhang and R.C. Bast Jr., "An application of artificial neural networks in ovarian
cancer early detection" in Proceedings of the IEEE-INNS-ENNS International Joint Conference onNeural
Networks, 2000, vol. 4, pp. 107112.
[24] J. Dripps E. Braithwaite and A.F. Murray, "Prediction of onset of respiratory disorder in neonates" vol. 4,
pp. 22032207.
[25] W. Welkowitz Y.M. Akay, M. Akay and J. Kostis, "Noninvasive detection of coronary artery disease"
IEEE Engineering in Medicine and Biology Magazine, vol. 13, no. 5, pp. 761-764, Nov.-Dec. 1994.
[26] E.J. Tkacz and P. Kostka, "An application of Wavelet neural network for classification of patients
with coronary artery disease based on hrv analysis" in Proceedings of the 22nd Annual International
Conference of the IEEE Engineering in Medicine and Biology Society. 2000, vol. 2, pp. 1391-1393.
[27] Z. Trajanoski and P. Wach, "Neural predictive controller for insulin delivery using the subcutaneous route"
IEEE Trans, on Biomedical Engineering, vol. 45, no. 9, pp. 11221134, Sept. 1998.
[28] P.E. Undrill J.S. Gregory, R.M. Junold and R.M. Aspen, "Analysis of trabecular bone structure using
Fourier Transforms and neural networks" IEEE Transactions on Information Technology in Biomedicine,
vol. 3, no. 4, pp. 289294, Dec. 1999.
[29] Schnorrenberg F. Tsapatsoulis N. C.S. Pattichis C.N. Schizus S. Kollias M. Vassiliou A. Adamou and
K. Kyriacou, "Improved detection of breast cancer nuclei using modular neural networks" IEEE
Engineering in Medicine and Biology Magazine, vol. 19, no. 1, pp. 4863, Jan.-Feb. 2000.
[30] J.Y. Lo W.H. Land Jr., T. Masters and D.W. McKee, "Application of evolutionary computation and neural
network hybrids for breast cancer classification using mammogram and history data" in Proceedings of
the 2001 Congress on Evolutionary Computation, 2001, vol. 2, pp. 11471154.
[31] C.Net2000+, Cardionetics, http://www.cardionetics.com
[32] Holter 2010 Plus, Holter Software for Windows, Philips, http://www3.medical.philips.com/
[33] A. Taddei F. Jager, G.B. Moody and R.G. Mark, "Performance measures for algorithms to detect transient
ischemic ST segment changes" in Computers in Cardiology 1991, Proceedings., pp. 369372.
[34] F. Pincinoli P. Bozola G. Bortolan, C. Combi and C. Brohet, "A hybrid neuro-fuzzy system for ECG
classification of myocardial infarction" 1996, pp. 241-244.
[35] Donna L. Hudson and Maurice E. Cohen, Neural Network and Artificial Intelligence for Biomedical
Engineering, IEEE Press Series in Biomedical Engineering, 2000.
[36] Horul Global HealtNet Inc. 7370 Hodgson Memorial Dr., Suite F3, Savannah, GA, 31406.
[37] Emmanuel C. Ifeachor Paulo J.G. Lisboa and Piotr S. Szczepaniak, Artificial Neural Network in
Biomedicine, Springer, 1999.
[38] Simon Haykin, Neural Network, a comprehensive foundation. Prentice Hall, 1999.
[39] Lutz Prechelt, "PROBEN1 A set of benchmarks and benchmarking rules for neural network training
algorithms" Tech. Rep. 21/94, Fakultat fur Informatik, Universitat Karlsruhe, D-76128 Karlsruhe.
Germany, 1994, Anonymous FTP: /pub/papers/techreports/1994/19942l.ps.Zon ftp.ira.uka.de.
321
[40] PhysioNet, MIT Room E25505A, 77 Massachusetts Avenue, Cambridge, MA 02139 USA,
http://www.physionet.org/
[41] Wisconsin Breast Cancer Database, collected by Dr. William H. Wolberg, University of Wisconsin
Hospitals, Madison, ftp://ftp.cs.wisc.edu/math-prog/cpo-dataset/machine learn/
[42] University of South Florida, DDSM: Digital Database for Screening Mammography,
http://marathon.csee.usf.edu/Mammography/Database.html
[43] ENV 13005 ISO, Guide to the Expression of Uncertainty in Measurement, 1999.
[44] A. F. Murray and P. J. Edwards, "Enhanced mlp performance and fault tolerance resulting from weight
noise during training" IEEE Trans, on AW, vol. 5, no. 3, pp. 792-802, Sept 1994.
[45] J. S. N. Jean and J. Wang, "Weight smoothing to improve network generalisation" IEEE Trans, on AW,
vol. 5, no. 5, pp. 752-763, Sept 1994.
[46] L. Holmstrom and P. Koistinen, "Using additive noise in back-propagation training" IEEE Trans, on AW,
vol. 3, no. 1, pp. 2438, Jan 1992.
[47] C. K. Mohan R. Anand, K. G. Mehrotra and S. Ranka, "An improved algorithm for neural network
classification of imbalanced training sets" IEEE Trans, on AW, vol. 4, no. 6, pp. 962969, Nov 1993.
[48] M. Parvis C. Gulotta and R. Torchio, "Evaluation of a postoperative risk index after lung resection by
means of a neural network" in Proceedings of the XIV 1MEKO World Congress, vol. 7, pp. 210215.
[49] R. Torchio M. Parvis, C. Gulotta, "Evaluation of surgical risks by means of neural networks in the presence
of uncertainties" Measurement, vol. 23, no. 3, pp. 171178, Apr 1998.
[50] R. Torchio M. Parvis, C. Gulotta, "Mixed neural-conventional processing to differentiate airway diseases
by means of functional non-invasive tests" IEEE Trans. on IM, vol. 50, no. 2, pp. 819824, June 2000.
[51] A. A. Afifi and S. P. Azen, Statistical Analysis, a computer oriented approach, Academic Press, Inc.,
1979.
Index
A
aircraft inspection, 207,216
Akaike information criteria (AIC), 65
analog computer, 275
analog hardware, 23
artificial cochlea, 30
artificial nose, 30
artificial retina, 29
artificial tongue, 30
ARX model, 91
ARX predictor, 95
asymptotic tracking, 104
asynchronous machines, 263
augmented reality (AR), 273, 274
auto-associative neural networks, 280
autocorrelation function, 126
B
backpropagation through time (BPTT), 61
backpropagation, 60,69,85, 88,98,100, 107, 122, 140, 142, 148, 151,285,294,297, 321
Bayes estimation, 47
Bayesian, 315
bearing, 167, 173
Bellman equation , 107, 109, 111
bias-variance trade-off, 63
bipolar-junction transistors (BJT), 265
black-box , 45,49, 54, 57, 62,68,76
breast cancer detection, 147, 152, 160
c
calibration, 11, 14, 16,36
cerebellar model articulation controller (CMAC), 53,60,69
certainty equivalence, 109
chaotic system, 120, 124, 127, 132,139
classification, 189, 201, 207, 210, 215
CO2 laser, 221,223, 240
competitive layer, 316, 317
composite system, 23, 27,228, 232,237, 240
computational paradigm partitioning, 27
computational paradigm synthesis, 27
condition monitoring, 167, 175, 185
confidence interval, 12, 233, 236, 242
configurable digital hardware, 24
configurable software simulator, 25
conformable model, 275, 284, 288
control, 190, 196, 201, 208, 213, 217
controllability, 96
324
D
dead-beat controller, 100
decision making instruments, 291,294
decision system, 191, 193,204,216
defect, 168, 175,182,188
defuzzification, 259
design methodology, 20, 27
detection, 189, 197, 200, 204, 207, 209, 216
deterministic model, 303
diagnosis, 167, 175, 178, 185,187
digital dedicated hardware, 24
digital electronic sensor design, 145, 161
digital imaging systems, 145,161
digital weight, 24
discriminant score, 315
distributed measurement system, 35
disturbances, 93
dual control, 108
dual heuristic programming, 114
dynamic backpropagation, 61
dynamic neural architectures, 51, 54, 77
dynamical system, 120, 123, 128, 132
E
electro-cardiogram (ECG), 292, 296, 299, 319
electromagnetic (EM), 273, 282, 284
electronic design automation (EDA), 273, 282, 289
Elman's network, 122
embedding delay, 129, 142
embedding dimension, 121,129,138, 142
embedding parameters, 121, 128, 139,143
embedding theorem, 121, 128, 129
energy, 168, 177, 180
enlarged training set, 311,313
errors in variables (EIV), 72
exact tracking, 105
F
false nearest neighbors method, 131
feature vector, 200, 210
features extraction, 231,235
features selection, 235, 240
feedback linearizability, 97
feed-forward neural network, 175, 178
filter, 292, 319
finite element method (FEM), 285
finite impulse response multilayer perceptron (FIR-MLP), 58, 61
flow measurements, 268
flux observer, 264
Fourier transform, 119, 127
four-points technique, 251
G
generalization, 62, 67, 73
gradient algebra, 85
gradient forward-propagation, 86, 89
grey-box, 45
H
Hammerstein model, 56
hardware neural networks, 273, 276, 280, 289
hardware/software partitioning, 28
health, 167, 177, 179, 182, 187
hearing sensor, 30
hidden, 190, 193, 197, 202, 204, 211
holographic memory, 191,197
hot-wire sensors, 268
human-computer interface (HCI), 273
hybrid-neural system, 75
I
image compression, 153
image fusion, 151,158
image quality contributors, 148, 159, 161, 162
image sensor, 29
image shape and segmentation, 152, 154, 159
image system design, 147, 161
image, 189, 194, 198, 200, 204, 208, 216
independent component analysis (ICA), 71, 119,121
internal model control, 103
J
Jordan's network, 122
K
K nearest neighbour classifier (KNN), 234,242,246
Kalman filter, 110
keyhole, 222,225,241
Kolmogorov's entropy, 120, 125, 133, 140
L
laser cutting, 220, 223, 236
laser processing, 219, 228, 243
laser welding, 220, 224, 240
least square (LS) estimation, 47, 59, 72
leave one out, 219, 223, 234, 236, 239
Levenberg-Marquardt, 60
1-finiteness, 84
linear autoregressive model, 122
Lipschitz quotient, 66
Lyapunov's function, 97, 99
Lyapunov's exponents, 120, 125, 128, 132, 140, 142
Lyapunov's spectrum, 121, 128, 132, 142
326
M
machine tool, 168, 186
manufacturing, 167, 172, 186
maximum likelihood (ML) estimation, 47, 75
measurement, 9, 189, 190, 192, 196, 199, 200, 207, 213, 216, 273, 275, 287
medical data set, 294, 298
medical instruments, 291, 319
membership function, 257
military applications, 147, 156,161
minimum description length (MDL), 65
minimum phase system, 102
mixture of experts (MOE), 75
model order selection, 65
model reference control, 102
model uncertainty, 302, 304
model validation, 62, 76, 287
modeling capability, 52, 54
modular neural network, 73, 198, 202, 205
multilayer perceptron (MLP), 51, 58, 60, 65, 68, 120, 140, 142, 145, 147, 152, 160, 165, 291, 293, 298, 306,
311
multi-recurrent neural network, 122
multisensor image classification, 148, 158
N
NARMAX model, 57, 64, 73
NARX model, 57, 62, 64, 66, 73, 92, 103, 109
NARX predictor, 95
Nd:YAG laser, 221, 224
network information criterion (NIC), 65
networked sensing system, 35
neural implementation, 23
neural paradigm, 20,23
neuro-dynamic programming, 110
neuro-fuzzy, 260
NFIR model, 57, 61, 64, 66
NOE model, 57
nonlinear autoregressive model, 122
nuclear magnetic resonance imaging, 147, 151, 159
o
observability, 90
OCCAM'S razor, 58
odor sensor, 30
optimal control, 106
overfitting, 298
P
parameter estimation, 44, 47, 58, 67
pattern recognition and classification, 149, 150, 156, 161
penetration depth, 222, 226, 243
perceptron, 50, 189, 193, 196, 214
permeability, 249, 253
permittivity, 249, 251, 253
phase-locked loops , 149, 161
physiologically motivated pulse coupled neural network (PCNN), 148, 151, 158
PID controllers, 115
plasma, 222, 235
327
R
raceway, 170, 177, 180, 182
radial basis function (RBF) network, 53, 60, 69, 120
random pulse, 276, 290
real world, 273, 288
real-time recurrent learning (RTRL), 62
recurrent neural network, 120, 122, 173, 187
reference model, 101
reference signal, 101
regressor, 51, 57, 63, 73
regularization, 59, 69
reinforcement learning, 110
remote sensing, 34, 147, 158
resistance, 250, 253, 255, 257, 263, 266, 269
resistivity, 249, 255, 268
robot, 189, 193, 199, 206, 213, 216
s
sensitivity, 297, 303, 308, 310, 318
sensor diagnosis, 33
sensor enhancement, 28
sensor fusion, 32, 189, 192, 198, 206, 212
sensor linearization, 31
separation, 43
severity, 175, 178, 182
signal processing, 119, 143
soft-sensors, 262
space reconstruction, 121, 140
stability, 96
stabilizability, 97
stabilization, 96
statistical learning theory, 65
statistical model, 303, 305, 311, 315, 318, 321
stereoscopic, 200, 204, 208
support vector machines (SVM), 72
symbiont, 274
synaptic, 275, 280
syntheric aperture radars (SAR), 147, 157, 164
system specification, 27
system validation, 233
T
tactile sensor, 30
tapped-delay line operator, 80
temporal backpropagation, 61
328
thermistors, 269
traceability, 16
tracking, 101
training, 190, 193, 197, 201, 205, 211, 217, 231,235,239,242
tree-like networks (TLN), 148, 156
u
uncertainty combination, 304, 309
uncertainty propagation, 302
uncertainty, 12, 291,299, 301, 304, 310, 312
unfolding-in-time, 61
universal approximation, 52, 68, 83
unmodeled dynamics, 94
V
vibration, 168, 173, 176, 185
virtual environment (VE), 273, 288
virtual prototyping environment (VPE), 273, 275, 282, 287, 289
virtual reality (VR), 273, 274
virtual sensor, 34
virtual workbench, 275, 284
virtual world, 273
virtualized reality environment (VRE), 273, 275, 288
visual sensor, 29
w
wavelet, 200, 210
Wheatstone bridge, 255
white-box, 45
Wiener model, 56
Wiener-Hammerstein model, 56
z
zero dynamics, 102
329
Author Index
Ablameyko, S.
Alippi,C
Baglio, S.
Blom,A.
Ferrari, S.
Ferrero, A.
Gao, R.X.
Giakos, G.C.
Golovko,V.
Horvath, G.
Maniakov, N.
Marchesi, R.
Nataraj, K.
Pacut, A.
Parvis, M.
Patnekar, N.
Petriu, E.M.
Piuri, V.
Savitsky, Y.
Siegel, M.
Vallan, A.
1
219
249
219
19
9
167
145
119
43
119
9
145
79
291
145
273
1,19
119
189
291