Fuzzy Neural Computing of Coffee and Tainted-Water Data From An Electronic Nose
Fuzzy Neural Computing of Coffee and Tainted-Water Data From An Electronic Nose
Abstract
In this paper we compare the ability of a fuzzy neural network and a common back-propagation network to classify odour samples that
were obtained by an electronic nose employing semiconducting oxide conductometric gas sensors. Two different sample sets have been
analysed: first, the aroma of three blends of commercial coffee, and secondiy, the headspaee of six different tainted-water samples. The two
experimental data sets provide an excellent opportunity to test the ability of a fuzzy neural network due to the high level of sensor variability
often experienced with this type of sensor. Results are presented on the application of three-layer fuzzy neural networks to electronic nose
data. They demonstrate a considerable improvement in performance compared to a common back-propagation network.
world data are often noisy, distorted and incomplete. In addi- having a particular numerical value (e.g., in the input or
tion, it is difficult to handle non-linear interactions output), we can describe values linguistically as very low,
mathematically. In many applications, the systems cannot be low, moderate, high, very high, etc. This kind of fuzzification,
modelled by other approximate methods such as expert sys- though tempting for some applications (e.g., classifying the
tems. In cases where the decision making is sensitive to small quality of odours), would not be suitable for others in which
changes in the input, neural networks play an important role. the boundaries are hard to specify. Fuzzy logic attempts to
Nevertheless, ANNs have some potential disadvantages as distinguish between possibility and probability as two distinct
well, since the choice of the way in which the inputs are theories governed by their own rules. Probability theory and
processed is often largely subjective and different results may Bayesian networks can be used where the events are repetitive
be obtained for the same problem. Furthermore, deciding on and statistically distributed. The theory of possibility is more
the optimal architecture and training procedure is often dif- like a membership-class restriction imposed on a variable
ficult, as stated above. Many problems would need different defining the set of values it can take. In the theory of proba-
subjective considerations, including speed, generalization bility, for any set A and its complement A c, A N A c -- O (null
and error minimization. ANNs have other potential disadvan- set), which is not true in the case of the theory of possibility.
tages as well. For example, there is very little formal mathe- Possibility distributions are often triangular and so similar in
matical representation of their decisions and this has been a shape to normal distributions with the mean value having the
major hurdle in their application in high-integrity and safety- highest possibility of occurrence, which is one. Any value
critical systems. outside the rain-max range has a possibility of occurrence of
Multi-layer perceptrons are the most commonly used ANN zero. Hence in mathematical terms, the possibility that a~ is a
in pattern classification and typically comprise an input layer, member of the fuzzy set X = {al, a2..... an } is denoted by its
an output layer and one or more hidden layers of nodes. Most membership value M(ay). This membership value of aj in X
of our electronic nose work has employed two-layer networks depends upon the mean, minimum and maximum of the set
(excluding the input layer), since the addition of further X. An introductory treatment to the theory of fuzzy logic is
hidden processing layers does not provide substantial given by McNeill and Freiburger [ 6]. A more mathematical
increases in discrimination power [3]. We have used an description of fuzzy sets and the theory of possibility is avail-
advanced BP method called Silva's method [4] in order to able in Dubois and Prade [7].
train the neural networks in the conventional way on the We have made use of the fuzzy neural model proposed
electronic nose data (described later) and then compare the initially by Gupta and Qi [8]. This model challenges the
results with fuzzy neural models. manner in which conventional networks are trained with ran-
dom weights, because these random weights may be disad-
vantageous to the overall training process. Let us consider a
3. Experimental details 12 X 3 X 3 neural network architecture. At the end of training
we hope to have an optimal point in 51 ( 12 × 3 + 3 × 3 +
3.1. Fuzzy neural model 3+3)-dimensional space that describes the best set of
weights with which to classify the training patterns, and also
Fuzzy logic is a powerful technique for problem solving to predict unknown patterns. This optimal point is harder to
which has found widespread applicability in the areas of achieve in practice as the data become more non-linear, addi-
control and decision making. Fuzzy logic was invented by tional difficulties being caused by noise in the data. The main
Zadeh in 1965 and has been applied over recent years to problem with random weights is that we usually start the
problems that are difficult to define by precise mathematical search from a poor point in space which either slowly, or
models. The approach is particularly attractive in the field of perhaps never, takes us to the desired optimal point, i.e., a
decision making, where information often has an element of global minimum. A suitable starting point, preferably
uncertainty in it. dependent on the kind of training data, is highly desirable. It
The theory of fuzzy logic in turn relates to the theory of can speed up training, reduce the likelihood of getting stuck
fuzzy sets where an effort is made to distinguish between the in local minima and take us in the right direction, the direction
theory of probability and possibility. There is more than one for the global minimum. The result is a better set of weights
way in which fuzziness can be introduced into neural net- that will better classify the test patterns. The fuzzy neural
works and hence different workers mean different things by network (FNN) approach adopted here attempts to do exactly
the term 'fuzzy neural network'. Some researchers define this. It makes use of possibility distributions [ 9 ], which helps
these networks as having fuzzy inputs and fuzzy outputs and in determining the initial set of weights. These weights them-
hence try to fuzzify (i.e., assign a membership value to data selves are fuzzy in nature and depend entirely on the training-
values within the range 0-1 using a possibility distribution) set distribution. Here the neural network reads a file of
before data are presented to the ANN. This concept can obvi- weights before training. These weights are generated in
ously be further extended, as described, for example, by advance by performing calculations on a possibility distri-
Zadeh [5], where the inputs and outputs are truly fuzzified bution function as shown in Fig. 1. Once the network is
by their transformation into linguistic terms. So rather than trained, the final weights are no longer fuzzy but can take any
S. Singh et al. I Sensors and Actuators B 30 (1996) 185-190 187
L~
x B Y The coffee data set provides an interesting challenge for
Measurement, v
the fuzzy neural models. It consisted of 89 patterns for three
different commercial coffees, 30 replicates of coffee A (a
(b}
0
j ~
model. It was soon realized that 100% recognition was
(Y-B/Z) Y (Y*BI?.} unlikely to be achieved. The testing was performed using n-
Measurement, v fold cross-validation) The initial data set was segmented to
Fig. 1. (a) The possibility distribution S(u: X, B, Y) is used to determine give either a training set of 80 patterns and a test set of nine
the membership value of a measurement u. S(v: X, B, Y) is given by 0 when patterns for the first two coffees (this was done with nine
u<~X, 2 ( u - X ) Z / ( Y - X ) 2 when X<u<~B, 1 - 2 ( u - y ) 2 1 ( Y - X ) 2 when fold), and then 81 patterns for training and eight patterns for
B < v ~<Yand 1 when u > Y.Note that Yis the mean value, X is the minimum
value and B is the bandwidth in this possibility distribution. (b) The mem- testing the last coffee. This was necessary because the third
bership function M is related to the function S with M= 1 - S ( v : Y, Y+B/ class of coffee had one missing pattern. Each pattern con-
2, Y+B) when u> Y, and M=S(o: B, Y-B~2, Y) when u ~<Y. sisted of 12 sensor values, xis. The patterns constituting the
training and testing set were rotated so that in every fold we
real value. These saved weights are then used with the test had a unique training and testing set. The 12 x 3 X 3 architec-
data for recognizing new patterns. ture was trained using both Silva's method (a modification
of the standard non-fuzzy back-propagation method) and its
3.2. E l e c t r o n i c n o s e fuzzy counterpart. Although the weights for our fuzzy model
were within the [0,1 ] range, the sensor data themselves were
The present work is concerned with the application of not coded in any particular way.
FNNs to electronic nose data. An electronic nose comprises
of a set of odour sensors that exhibit differential response to A bootstrapping method could have been used to improve the true error
a range of vapours and odours [ 10]. Previous work has been prediction,but we wanted to compare the results with earlier work that used
carried out in the Sensors Research Laboratory and the Intel- cross-validation [ 13].
ligent Systems Engineering Laboratory at the University of
W a r w i c k to identify alcohols and tobaccos [ 11,12]. Table 1
Commercial semiconducting oxide gas sensors from Figaro Engineering
Here data were collected from an array of semiconducting Inc., Japan used to analyse the coffee and water samples
oxide gas sensors (i = 1 to n) in response x U to a measurand
j in terms o f a fractional change in steady-state sensor con- Sensor No. Coffee samples Water samples
ductance G, namely,
TGS 800 { ~/
(Go,~our- G~ar) ( 1) TGS 815 × ~/
xij = a~ r TGS 816 × ~/
TGS 821 x ~/
This was chosen because it was found to reduce sample var- TGS 823 x ~/
TGS 824 × v/
iance in earlier work on odours [ 10] and is recommended
TGS 825 ~/ ~/
for use with semiconducting oxide gas sensors in which the TDS 830 ~/ ~/
resistance falls with increasing gas concentration. TGS 831 x ~/
The electronic nose comprised a set of either 12 or four TGS 842 x ~/
commercially available Taguchi gas sensors (Figaro Engi- TGS 880 x/ x
TGS 881 x ~/
neering Inc., Japan); see Table 1 for the choice of sensors.
TGS 882 x ~/
The odour sensors have a sensitivity to certain gases at the TGS 883 x {
ppm level. Measurements were made under constant ambient
Total 4 12
conditions (e.g., at 30 °C and 50% R H ) . W e shall now briefly
188 S. Singh et al. / Sensors and Actuators B 30 (1996) 185-190
3.4. Water data Now let us describe the network mathematically. The
inputs nodes can be defined by a vector/, the hidden nodes
In this case the data set was collected using a smaller by a vector m and the output nodes by a vector n. The mem-
portable four-element electronic nose rather than the 12-ele- bership value mi serves as a weight between l~ and all nodes
ment system used to collect the coffee data. There were in all of re. Hence we can determine the weights of all the neurons
60 different patterns for six different types of water. The connecting the input layer to the hidden layer.
headspace of two vegetable-smelling waters types A and B, A very similar approach is adopted for finding the weights
a musty water, a bakery water, a grassy water and a plastic connecting the hidden layer to the output layer, but rather
water were analysed. Taking 10 folds again (rotating the than using the sensor value distributions, the hidden-node
patterns in training and testing sets), the network was trained output distributions are used. In order to obtain these (if two-
with 54 patterns at any one time and tested with the remaining layer networks are being used), the network needs to be
six patterns. Each pattern consisted of four sensor values. The initially trained for a few iterations with random weights in
neural network used had a 4 × 6 × 6 architecture just like its the non-fuzzy mode. The hidden-node outputs can then be
fuzzy counterpart. separately analysed following the steps given above.
4.1. Example
4. Data analysis u s i n g f u z z y n e u r a l m o d e l
Let us see the role of possibility distribution in the Sensor
1 data for coffee A. We have chosen the first 26 values and
In order to illustrate how a fuzzy neural model works, let
found the following statistics:
us consider the above problem of discriminating between a
n=26
set of different coffee samples. The first step is to define the
Mean (Y) ---0.0706
training and testing sets. The training set can contain 27
Min (X) = 0.0564
patterns of each coffee (i.e., A, B, and C), a total of 81
B = ( X + Y)/2 =0.0635
patterns (about 90% of the patterns ), and a testing set of two
Let us find the membership value of two measurements
or three patterns of each type, a total of eight or nine ( 10%
chosen at random, v = 0.076 and v = 0.1011. (Please refer to
of all patterns). The next step is to obtain the starting weights,
the formula in Fig. 1 for the following calculation. A mem-
which are no longer random weights as in conventional net-
bership value is the possibility that v is the member of the set
works. These will be obtained using possibility distribution
of all 26 Sensor 1 values.)
functions (see Fig. 1). It is possible to use the permutations
When v = 0.076,
of different coffees with different sensors to yield many dis-
M = 1 - S (0.0706, 0.10235, 0.1341)
tributions (e.g., 36 different distributions can be drawn with
-- 1 - 0.0144
three different coffees and 12 sensors). In order to find the
-- 0.985
weights, a choice must be made of which coffee patterns will
(This is expected since the membership value of any value
be used to generate weights (since sensor values of coffees
very close to the mean is nearly 1.)
A, B and C differ significantly, only one coffee type can yield
When v--0.1011,
membership values). We chose coffee A data to assist in this
M = 1 - S (0.1011, 0.0706, 0.10235, 0.134)
process, since the sensors have registered higher values than = 1 - 0.4628
in the case of coffees B and C (since medium-roasted coffees
= 0.537
contain more volatile molecules than darker-roasted ones)
and noise levels here are supposed to be higher. Out of the
27 patterns used for training, one pattern is taken out at ran-
dom called P. The remaining 26 patterns are used to generate 5. Results
the distribution for each sensor (i.e., a total of 12). The
formula used for such a process is described by Zadeh [5] It was evident that the sensor outputs were non-linear in
as shown in Fig. 1. It may be seen that the possibility of concentration and contained significant errors attributable to
occurrence of any measurement decreases quadratically as it systematic noise. Initially, after trying several different train-
gets further away from the mean value. The variable B in the ing algorithms and architectures on a non-fuzzy neural net-
formula is the measurement for which the possibility value work, the success rate was no better than 86% on the coffee
is 0.5 and is also known as the 'cross-over' point. A further data and no better than 75% on the water data.
explanation of the details of the formula can also be found in Tables 2 and 3 summarize the results of our data analysis,
Mamdani and Gaines [ 14]. Once all of the distributions have and show the superior performance of the fuzzy neural model
been generated (D1,/92 ..... D12), the membership of sensor when compared to the BP technique. Note that when the
values in pattern P (sl, s2..... s12) is determined. This means difference in the final output value and the desired value of
we find the membership of s, in distribution D, (let us say it any output-layer node was above the error tolerance limit, it
is m;) for P. was tagged as misclassified. If more than half of the nodes in
S. Singh et al. / Sensors and Actuators B 30 (1996) 185-190 189
Table 2 classified nodes and patterns using the FNN model and the
Results of analysing the coffee data. 81 patterns were used for training with back-propagation model for the coffee and water data. In the
nine patterns tested in each fold
case of coffee data, the hypothesis I4-o was comfortably
Fold Patterns Nodes Patterns Nodes rejected at 5% significance level ( t = - 3 . 8 6 , p = 0 . 0 0 2 for
miselassified misclassified misclassified miselassified patterns 3 and t = - 3 . 5 0 , p=0.0034 for nodes). The same
by FNN by FNN by BP by BP results were obtained for the water data ( t = - 5 . 0 1 ,
p = 0.0004 for patterns and t = - 3.35, p = 0.0042 for nodes).
1 1 2 1 2
2 0 0 1 2
This shows that our FNN is a significantly better technique
3 1 2 2 4 than the conventional BP network.
4 1 2 3 6
5 1 2 1 2
6 1 2 1 2 6. Conclusions
7 0 0 1 1
8 0 0 1 2
9 1 2 3 5 Fuzzy neural networks (FNNs) have been shown to man-
10 1 1 2 2 age uncertainty in real-world sensor data. Their performance
Total 7 13 16 28 on electronic nose data was found to be superior to that of
their non-fuzzy neural counterparts. We believe that this was
due to the possibility distribution for weight determination
Table 3
Results of analysing the tainted-water data. 54 patterns were used for training averaging out the uneven uncertainty found in the poor
with six patterns tested in each fold semiconducting oxide gas sensors. This is especially impor-
tant when there is a huge search space and a good starting
Fold Patterns Nodes Patterns Nodes point is required. The performance given by non-fuzzy net-
misclassified misclassified misclassified misclassified
works depends on the initial set of random weights or other
by FNN by FNN by BP by BP
training parameters. In our comparison we used a good non-
1 1 3 3 5 fuzzy back-propagation network and so our FNN results
2 1 2 2 3 would be even more favourable if compared to a 'vanilla'
3 0 0 1 1 back-propagation network. FNNs are generic and so may be
4 2 3 2 3
applied to areas in which standard neural networks are cur-
5 1 2 2 2
6 0 0 1 2 rently employed. In conclusion, the introduction of fuzzy
7 1 2 3 4 parameters into conventional neural networks can offer a
8 1 2 3 4 significant advantage when solving difficult classification
9 1 2 1 2 problems such as that presented by electronic nose instru-
10 I 2 1 2
mentation.
Total 9 18 19 28
2 Note that linear discriminant function analysis yielded a value of only 3 The critical t value at 5% significance level and 9 degrees of freedom is
80%, see Ref. [13]. 1.83.
190 S. Singh et al. / Sensors and Actuators B 30 (1996) 185-190
[ 5 ] L.A. Zadeh, Fuzzy Logic and Its Applications, Academic Press, New Institution of Electrical Engineers (UK). His main research
York, 1965, pp. 29-33. interests include neural networks, fuzzy logic, expert systems
[6] D. McNeill and P. Freibeger, Fuzzy Logic, TouchstoneBooks, New
York, 1993. and linguistic computing.
[7] D. Duboisand H. Prade, Fuzzy Sets and Systems, Vol. 144, Academic E v o r Hines is a senior lecturer in electronics in the Depart-
Press, New York, 1980.
[8] M.M. Gupta and J. Qi, On fuzzy neuron models, in L.A. Zadeh and J. ment of Engineering at the University of Warwick. His
Kacprzyk (eds.), Fuzzy Logic for the Management of Uncertainty, research interests include intelligent systems engineering
John Wiley, New York, 1992, pp. 479-490. areas such as artificial neural networks, genetic algorithm and
[9] S. Singh, Fuzzy neural networks for managing uncertainty, M.Sc. fuzzy logic. Over the last seven or so years he has been
Dissertation, Universityof Warwick, UK, 1993. involved in work applying these techniques in areas such as
[ 10] J.W. Gardner and P.N. Bartlett, A brief history of electronic noses,
Sensors and Actuators B, 18-19 ( 1995) 211-220. sensor data processing (e.g., electronic nose), medical data
[ 11] J.W. Gardner, E.L. Hines and M. Wilkinson, Application of artificial processing (e.g., spectral imaging, impedance imaging),
neural networks in an electronic nose, Meas. ScL Technol., 1 (1990) business data processing (e.g., stock prediction, sales fore-
446--451. casting), amongst others. He has been involved in the pub-
[12] H.V. Shurmer, J.W. Gardner and H.T. Chart, The application of lication of more than 70 papers.
discrimination techniques to alcohols and tobaccos using tin oxide
sensors, Sensors and Actuators, 18 (1989) 361-371. Julian W. Gardner was born in Oxford, UK in 1958. He
[ 13] J.W. Gardner, H.V. Shurmerand T.T. Tan, Applicationof an electronic received his B.Sc. with highest honours in 1979 from Bir-
nose to the discrimination of coffees, Sensors and Actuators B, 6
(1992) 71-75. mingham University and the Ph.D. degree from Cambridge
[14] E.H. Mamdani and B.R. Gaines (eds.), Fuzzy Reasoning and its University in 1983. His dissertation focused on electron con-
Applications, Academic Press, New York, 1981. duction in thin-film devices. From 1983 to 1987 he was in
industry, where he worked on instrumentation and sensors.
He is currently a reader in microengineering in the Depart-
Biographies ment of Engineering at Warwick University. His research
interests include the modelling of electron devices, silicon
S a m e e r Singh was born in New Delhi, India in 1970. After microsensors, chemical sensor array devices and electronic
receiving his BE in computer engineering with distinction noses. He is author or co-author of over 150 technical papers
from BIT, India, he got his M.Sc. in information technology and has recently published a book on microsensors. Dr Gard-
for manufacturing from Warwick University in 1993. He has ner is a member of the Institution of Electrical Engineers
recently completed his Ph.D. in the application of artificially (UK) and the Institute of Electrical and Electronic Engineers
intelligent methods for quantifying recovery in speech and (US). In 1989 he received an Esso Centenary Education
language disorders. He is now a lecturer in the School of Award from the Royal Society and Fellowship of Engineer-
Computing, University of Plymouth, UK. He is a fellow of ing, London and was, in 1994, an Alexander von Humboldt
Royal Statistical Society (UK) and associate member of the Research Fellow in Germany.