Rafael ColasMarquez PhdThesisApproved
Rafael ColasMarquez PhdThesisApproved
Rafael ColasMarquez PhdThesisApproved
By: Supervisor:
September 2019
Abstract
This Thesis focuses on two expansions to the traditional fuzzy set, these are complex
fuzzy sets and fuzzy rough sets. Complex fuzzy sets add context to linguistic variables,
resulting in compact models capable of describing the interaction between features and
outputs as interferences. The developed complex fuzzy inference systems are
demonstrated to be transparent and interpretable with an increase of up to 10% in
prediction accuracy in comparison with state-of-the-art known fuzzy modelling
approaches and up to a 300% reduction in computational time for training. Further
advances are presented for the development of a complex Gaussian membership
function to model uncertainties. Expanding the model to the complex domain present
further advantages, including the application of complex-valued statistics for the
development of a feature selection algorithm. Fuzzy rough sets are implemented for
identifying inconsistencies in datasets. The models and algorithms developed in this
work are applied to four real-world datasets, demonstrating the applicability in different
areas. The first two datasets are material testing datasets obtained from industrial
applications; the third dataset contains the information of a survival analysis performed
in patients suffering from bladder cancer; the fourth dataset describes the critical
temperature of superconductors.
i
Publications
ii
Contents
Contents
List of Figures ............................................................................................................... ix
List of Tables .............................................................................................................. xiv
List of Algorithms ......................................................................................................xvii
Abbreviations ........................................................................................................... xviii
Chapter 1 Motivation and Thesis Overview ............................................................... 1
1.1 Motivation and Introduction .......................................................................... 1
1.2 Thesis Overview ............................................................................................ 2
Chapter 2 State of the Art ........................................................................................... 5
2.1 Fuzzy Sets and Fuzzy Logic .......................................................................... 5
2.1.1 Fuzzy Membership Functions .................................................................... 7
2.1.2 Fuzzy Logic Operators............................................................................... 7
2.1.3 Fuzzy Rules and Inference ......................................................................... 9
2.2 Fuzzy Inference Systems ............................................................................... 9
2.2.1 Mamdani Fuzzy Inference Systems ......................................................... 10
2.2.2 TSK Fuzzy Inference Systems ................................................................. 11
2.2.3 Single Input Fuzzy Inference Systems ..................................................... 13
2.2.4 Fuzzy Rule-Base Elicitation .................................................................... 15
2.3 Neuro-Fuzzy Inference Systems .................................................................. 17
2.3.1 Artificial Neural Networks ...................................................................... 17
2.3.1.1 The Error-Backpropagation Algorithm............................................ 18
2.3.1.2 Radial Basis Function Networks...................................................... 19
2.3.2 Neuro Fuzzy Mamdani Fuzzy Inference System ..................................... 20
2.3.3 The Adaptive Network Based Fuzzy Inference System .......................... 22
2.4 Type-2 Fuzzy Sets........................................................................................ 24
2.5 Rough Sets ................................................................................................... 25
2.5.1 Fuzzy Rough Set Theory ......................................................................... 29
iii
Contents
iv
Contents
v
Contents
vi
Contents
vii
Contents
8.4 Multiple Point Prediction for Datasets Containing Inconsistencies .......... 183
8.4.1 Results .................................................................................................... 184
8.5 Data-Mining Utilizing Fuzzy Rough Sets- Application to The Bladder Cancer
Dataset.................................................................................................................... 187
8.6 Summary .................................................................................................... 188
Chapter 9 Conclusions and Future Work ............................................................... 189
9.1 Conclusions ................................................................................................ 189
9.2 Future Work ............................................................................................... 191
References .................................................................................................................. 193
viii
List of Figures
List of Figures
Figure 2.1: Oven temperature example to compare fuzzy sets and crisp sets. .............. 6
Figure 2.2: Gaussian, triangular and singleton membership functions. ......................... 7
Figure 2.3: Two-dimensional grid-partition with three membership function per feature.
.............................................................................................................................. 16
Figure 2.4: Two-dimensional cluster rule-base. .......................................................... 16
Figure 2.5: One hidden layer feedforward ANN. ........................................................ 18
Figure 2.6: Single output RBFN a) weighted sum output and b) weighted average
output. .................................................................................................................. 20
Figure 2.7: ANFIS schematic. ..................................................................................... 23
Figure 2.8: Rough set representation. .......................................................................... 28
Figure 2.9: ANCFIS schematic. ................................................................................... 36
Figure 2.10: ACNFIS schematic. ................................................................................. 39
Figure 3.1: Charpy impact test DBTT curve. .............................................................. 46
Figure 3.2: Charpy Impact partial correlation plot. ..................................................... 48
Figure 3.3: Ultimate Tensile Strength Partial correlation plot. .................................... 50
Figure 3.4: [102] Illustration of right censoring: Patients A and B, outlived the study,
Patient C was lost due to an unrelated event, patient E withdrew from the study.
The records of patient A and F are the only ones not censored as the time of death
from the event of interest occurred within the duration of the study. The recorded
time is equal to the observed time only. In this example patient C last observed
time is 20 months, as the observation period begun at 20 th month and was lost at
the 40th month. ..................................................................................................... 51
Figure 4.1: The SICFIS schematic. .............................................................................. 58
Figure 4.2: (a) Initial grid partition for a feature p. (b) Initial vector assigned to the
output of a rule, with a length equal to p,s and an phase equal to p,s . ...... 60
p p
ix
List of Figures
Figure 4.4: Vector partition plot for Carbon (C), Iron (Fe) and the process “X”. ....... 68
Figure 4.5: Cosine distance matrix plot for Carbon (C), Iron (Fe) and the process “X”.
.............................................................................................................................. 69
Figure 4.6: Magnitude Phase plots for Carbon (C), Iron (Fe) and the process “X”. ... 69
Figure 4.7: Resultant vector for high carbon steel, medium carbon steel with process
“X” and high carbon steel with process “X”. ...................................................... 69
Figure 4.8: Charpy recursive backpropagation RMSE at each epoch. ........................ 71
Figure 4.9: Charpy batch backpropagation RMSE at each epoch. .............................. 72
Figure 4.10: Charpy LM RMSE at each epoch............................................................ 73
Figure 4.11: The fast-SICFIS schematic...................................................................... 75
Figure 4.12 Charpy impact dataset, training, checking and testing performance for
different number of epochs for the normalized and fast SICFIS models. ........... 76
Figure 4.13 Charpy impact dataset, training times for the normalized and fast SICFIS
models for different number of epochs. ............................................................... 77
Figure 4.14: Charpy Impact test, results regression plot, normalized-SICFIS model with
6 membership functions partitions per feature..................................................... 80
Figure 4.15: Charpy Impact test, results regression plot, fast-SICFIS model with 5
membership functions partitions per feature........................................................ 81
Figure 4.16: UTS test, results regression plot, normalized-SICFIS model with 6
membership functions partitions per feature........................................................ 83
Figure 4.17: UTS test, results regression plot, fast-SICFIS model with 5 membership
functions partitions per feature. ........................................................................... 84
Figure 4.18: Normalized-SICFIS 2 membership functions ROC curves. ................... 87
Figure 4.19: Normalized-SICFIS 2 membership functions scores scatter plot. .......... 88
Figure 4.20: Fast-SICFIS 4 membership functions ROC curves. ................................ 88
Figure 4.21: Fast-SICFIS 4 membership functions scores scatter Plot. ...................... 89
Figure 4.22: Two-dimensional magnitude and phase scatter plot of results................ 92
Figure 4.23: Charpy impact test magnitude-phase plots.............................................. 92
x
List of Figures
Figure 5.1 Fuzzy partition coefficient values given different clusters and changing the
fuzzy partition exponent value. .......................................................................... 102
Figure 5.2: The real-ANFIS-SICFIS schematic......................................................... 104
Figure 5.3: The complex-ANFIS-SICFIS schematic. ................................................ 106
Figure 5.4: Real and Complex ANFIS-SICFIS global performance for the three
optimization process given 2,3 and 4 rules. Stacked bar chart. ......................... 111
Figure 5.5: Real and Complex ANFIS-SICFIS local performance for the three
optimization process given 2,3 and 4 rules. Stacked bar chart. ......................... 111
Figure 5.6 Training times for the complex-ANFIS-SICFIS model utilizing the alternate,
consequent and complete parameter optimization method with a varying number
of rules and membership functions (mF). Overlapping bar chart. ..................... 112
Figure 5.7: effect of membership functions to performance. ..................................... 114
Figure 5.8: Charpy Impact complex ANFIS-SICFIS global performance 2 rules. ... 115
Figure 5.9: Charpy Impact complex ANFIS-SICFIS local performance 2 rules. ...... 116
Figure 5.10: Effect of membership functions to performance ................................... 118
Figure 5.11: UTS complex ANFIS-SICFIS global performance 5 rules. .................. 119
Figure 5.12: UTS complex ANFIS-SICFIS local performance 5 rules. .................... 120
Figure 5.13: Bladder cancer ROC curves for the global (a) and local performance (b).
............................................................................................................................ 122
Figure 5.14: Bladder Cancer Global Scores. ............................................................. 123
Figure 5.15: Bladder Cancer Local Scores. ............................................................... 123
Figure 6.1: Two-dimension view of a Gaussian and singleton membership function,
center b=0.5 and =0.2. ................................................................................... 130
Figure 6.2: Three-dimension view of a singleton membership function, center =0.5
xi
List of Figures
Figure 6.4: Three-dimension view of a complex Gaussian membership function and the
corresponding real and imaginary projection. Center =0.5, spread =0.2 and
xii
List of Figures
Figure 7.5: UTS Fuzzy-rough sets Backward elimination feature selection results. 159
Figure 7.6:Bladder Cancer Fuzzy-rough sets Backward elimination feature selection
results. ................................................................................................................ 160
Figure 7.7: Charpy Impact Magnitude Phase Plots. .................................................. 161
Figure 7.8: Charpy Impact normalized complex-valued output prediction varying:
Carbon (C), Sulphur (S), Nickel (Ni) and tempering temperature (T. Temp). .. 162
Figure 7.9: Charpy impact test feature histogram. ..................................................... 163
Figure 7.10: Charpy SICFIS-Filter feature selection results. .................................... 168
Figure 7.11: UTS SICFIS-Filter feature selection results. ......................................... 170
Figure 7.12: Bladder Cancer SICFIS-Filter feature selection results. ....................... 171
Figure 7.13: Charpy Results Comparisons between Filter-SICFIS methods, Wrapper-
SICFIS and Fuzzy Rough sets ........................................................................... 173
Figure 7.14: UTS Results Comparisons between Filter-SICFIS methods, Wrapper-
SICFIS and Fuzzy Rough sets ........................................................................... 173
Figure 7.15: Cancer Results Comparisons between Filter-SICFIS methods, Wrapper-
SICFIS and Fuzzy Rough sets ........................................................................... 174
Figure 8.1: Effect of the number of features in Feature Dependency. ....................... 180
Figure 8.2: Effects on the number of inconsistencies given different number of features
and different threshold values. ........................................................................... 180
Figure 8.3: Example of a KNN classification utilizing Euclidean distances. If k=1,5
then test sample will be classified as a circle, if k=3 test sample is classified as
square, tie resolution is problem dependent. ...................................................... 182
Figure 8.4: Effect of inconsistence in Charpy impact prediction (a) and UTS prediction
(b). ...................................................................................................................... 183
Figure 8.5: Charpy Impact test prediction interval for consistent testing partition. .. 186
Figure 8.6: Charpy Impact test prediction interval for inconsistent testing partition.186
xiii
List of Tables
List of Tables
Table 2.1 Mamdani FIS rule-base................................................................................ 10
Table 2.2: TSK FIS rule-base. ..................................................................................... 11
Table 2.3: SIRM rule-base. .......................................................................................... 14
Table 2.4: Information table example. ......................................................................... 25
Table 3.1 Charpy Impact Dataset information. ............................................................ 47
Table 3.2: UTS dataset information. ............................................................................ 49
Table 3.3 Bladder Cancer dataset information. ........................................................... 52
Table 4.1: Complex fuzzy rule-base to determine voter turnout in an election. .......... 56
Table 4.2: Example of a SICFIS rule-base. ................................................................. 65
Table 4.3: Example of the derived grid-partition rule-base from the SICFIS rule-base.
.............................................................................................................................. 65
Table 4.4: SICFIS rule-base......................................................................................... 67
Table 4.5: Grid partition rule-base. .............................................................................. 68
Table 4.6: Charpy impact dataset parameter grid. ....................................................... 78
Table 4.7 Charpy Impact Normalized-SICFIS Results Summary. .............................. 78
Table 4.8: Charpy Impact Fast-SICFIS Results Summary. ......................................... 78
Table 4.9: Charpy Impact SICFIS Best Results........................................................... 79
Table 4.10 Charpy Impact Results Comparison. ......................................................... 79
Table 4.11: Charpy Impact, initial FIS and training computation times in seconds. ... 81
Table 4.12: UTS parameter grid. ................................................................................. 82
Table 4.13: UTS normalized-SICFIS UTS results summary....................................... 82
Table 4.14: UTS fast-SICFIS UTS results summary. .................................................. 83
Table 4.15: UTS Normalized and Fast SICFIS UTS Best Results. ............................. 84
Table 4.16: UTS results comparison. ........................................................................... 85
Table 4.17: Bladder Cancer Parameter Grid. ............................................................... 86
Table 4.18 Normalized-SICFIS Bladder Cancer Results Summary. ........................... 86
Table 4.19: Fast-SICFIS Bladder Cancer Results Summary. ...................................... 86
xiv
List of Tables
Table 4.20:Normalized and Fast SICFIS Bladder Cancer Best Results. ..................... 87
Table 4.21: Bladder Cancer Results Comparison. ....................................................... 87
Table 4.22: Normalized-SICFIS 2 membership functions Confusion Matrix. ............ 89
Table 4.23: Fast-SICFIS Confusion 4 membership functions Matrix. ........................ 89
Table 5.1: ANFIS-SICFIS Rule-base. ......................................................................... 99
Table 5.2: Parameter grid search. .............................................................................. 110
Table 5.3: Parameter grid search for the Charpy impact test. .................................... 113
Table 5.4: Charpy Mean RMSE results given different number of rules. ................. 113
Table 5.5: Charpy Standard deviation results given different number of rules. ........ 113
Table 5.6: Charpy Best results given different number of rules. ............................... 115
Table 5.7: Charpy results comparison. ...................................................................... 116
Table 5.8: UTS mean of results given different number of rules. ............................. 117
Table 5.9: UTS standard deviation of results given different number of rules. ......... 117
Table 5.10: UTS Best results given different number of rules. ................................. 118
Table 5.11: UTS result comparisons .......................................................................... 118
Table 5.12: Parameter grid search for the Bladder Cancer dataset. ........................... 121
Table 5.13: Bladder Cancer Mean results. ................................................................. 121
Table 5.14: Bladder Cancer standard deviation results. ............................................ 121
Table 5.15: Bladder Cancer best results given a number of rules and membership
functions............................................................................................................. 122
Table 5.16 Bladder Cancer Results Comparison. ...................................................... 122
Table 6.1 Complex and Type-1 defuzzification ........................................................ 136
Table 6.2: Mamdani SICFIS rule-base ...................................................................... 138
Table 6.3: Charpy impact Mamdani-SICFIS parameter grid..................................... 141
Table 6.4: Charpy Impact Mamdani-SICFIS Results Summary. .............................. 141
Table 6.5: Charpy Impact Mamdani-SICFIS Best Results. ....................................... 142
Table 6.6: UTS Mamdani-SICFIS parameter grid..................................................... 143
Table 6.7: UTS Mamdani-SICFIS results summary. ................................................. 143
Table 6.8: UTS Mamdani-SICFIS best results. ......................................................... 143
xv
List of Tables
xvi
List of Algorithms
List of Algorithms
Algorithm 4.1: SICFIS initialization............................................................................ 61
Algorithm 4.2: Levenberg-Marquardt optimization .................................................... 74
Algorithm 5.1 Fuzzy C-Means clustering algorithm ................................................. 100
Algorithm 5.2: Local Performance Evaluation .......................................................... 109
Algorithm 7.1: Backward elimination algorithm. ...................................................... 152
Algorithm 7.2: Forward selection algorithm. ............................................................ 152
Algorithm 8.1: Data selection for training M SICFIS models to perform the multiple
point prediction. ................................................................................................. 185
xvii
Abbreviations
Abbreviations
ACNFIS Adaptive Complex Neuro Fuzzy Inferential System
AI Artificial Intelligence
GA Genetic Algorithm
LM Levenberg-Marquardt
xviii
Abbreviations
PC Partial Correlation
Q Quadrant
SD Standard Deviation
TSK Takagi-Sugeno-Kang
xix
Chapter 1
Motivation and Thesis Overview
Chapter 1
Motivation and Thesis Overview
The development and the application of machine learning and Artificial Intelligence
(AI) models have increased significantly in the last decade. With the application of such
algorithms to high impact areas such as medical diagnosis and manufacturing it is
important to develop accurate but also interpretable models based on human intuition.
The increased availability of high computing power has made feasible to develop
complex machine learning models capable of surpassing human performance in certain
applications, such is the case with deep Artificial Neural Networks (ANN) [1]. Many
of these algorithms are being deployed in sensitive areas such as medicine [2] and
finance [3]. The problem with such complex machine learning algorithms remains the
inability of interpreting the inference process of black-box models. In recent years The
European Union’s General Data Protection Regulation included a section known as the
“right to explanation”, these laws may have a serious impact in the accountability of
companies and industries that use machine learning and AI algorithms, potentially
leading to the development of laws requiring the utilization of interpretable machine
learning models or the development of tools to interpret the inference process of black-
box models [4].
Fuzzy logic was developed with the intention to model human reasoning [5]. Fuzzy
Inference Systems (FIS) are AI models capable of describing a system utilizing a rule-
base composed of linguistic variables [6]. Compared with black-box models, FISs are
1
Chapter 1
Motivation and Thesis Overview
known to be transparent and interpretable given the approximation with human natural
language. The transparency of a FIS assures the applicability of the model within a
range of operations, while the interpretability allows the model to be validated by
experts and allows to extract valuable information from a dataset to derive conclusions
and make decisions [7].
The models and tools developed are implemented using four different real-world
datasets. The first two are industrial datasets, containing information of two common
material testing, Charpy impact test, and Ultimate Tensile Strength (UTS) one. The
third dataset is a medical dataset obtained from a survival study of patients suffering
from bladder cancer. The fourth dataset describes the critical temperature of
superconductors.
Each one of the datasets studied in this work present different challenges. Applying
the tools developed on such different datasets demonstrates its generalization properties
and the possibilities to expand the application of such tools to other areas.
Chapter 2 contains the literature review surveyed in this work. A brief overview of
fuzzy logic and fuzzy sets is provided, followed by a review of the different types of
FISs, including neuro-FIS. New advances in the expansion of the fuzzy sets are later
2
Chapter 1
Motivation and Thesis Overview
introduced, including the rough sets and CFS, the focus of this work. The chapter is
finalized by presenting an overview of interpretability.
Chapter 3 include detailed information regarding the four datasets studied in this
Thesis. The first two are material testing datasets obtained from a Charpy impact and a
UTS testing. The third dataset is a survival study performed on patients suffering from
bladder cancer. The fourth dataset contains information related to the critical
temperature of superconductors.
Chapter 4 introduces the Single Input Complex Fuzzy Inference System (SICFIS).
The SICFIS is a single feature partition per rule FIS. The concept of interference is
exploited to represent the complex interaction between features and outputs. The
SICIFIS model is proved to be transparent and interpretable, with a performance
superior to state-of-the-art fuzzy models.
Chapter 5 improves the known Adaptive Neuro Fuzzy Inference System (ANFIS)
model by substituting the linear regression consequents with SICFISs models. The
ANFIS-SICFIS therefore becomes a global model composed of local interpretable
SICFISs, results obtained are comparable with ensemble-ANN and evolutionary ANN
models. The interpretability of the model is assessed by using a local-global
performance index.
Chapter 7 presents the development of a filter method for feature selection based on
the SICFIS model developed in Chapter 4. The results obtained are comparable with
known feature selection algorithms with a considerable reduced computing time.
3
Chapter 1
Motivation and Thesis Overview
In Chapter 8 fuzzy rough sets are utilized for data-mining applications in the Charpy
impact test dataset and the Bladder Cancer dataset. Fuzzy rough sets offers a novel tool
to obtain deeper insight in the datasets and extract valuable information for developing
prediction models.
Chapter 9 presents the conclusions and the future work in the field of complex FIS.
4
Chapter 2
State of the Art
Chapter 2
State of the Art
Fuzzy sets and fuzzy logic were developed by Zadeh in [5] to model and
approximate human reasoning. Fuzzy sets have a continuum grade of membership
between 0 and 1, which allows the representation of vagueness and uncertainty in
human natural language and in real world objects. While traditional sets classify objects
with an absolute membership value of either belonging or not belonging to a class (truth
or false; 1 or 0) statements such as “the oven is hot” are not intuitively represented as
either completely truth or false. For example, an oven at a temperature of 160° degrees
can be considered as “hot”, or even “very hot”, another oven with a temperature of 175°
may considered to be between “hot” and “very hot”. Traditional logic is not capable of
representing such statement as intuitively as fuzzy logic. Because of the continuum
degrees of membership, it is possible to define “soft” boundaries between classes,
allowing for an intuitive transition between class membership and the changes in a
feature. In contrast with traditional logic which can be considered as having “hard”
boundaries, small changes in a feature could mean complete change in a class
membership, for example, an oven whose temperature changes from 174° to 176°
would change from class membership “hot” to “very hot” instantly.
5
Chapter 2
State of the Art
A = ( x, A ( x) ) x U (2.1)
where A ( x) is called the membership function for the fuzzy set A. The membership
function maps each element of U to a membership grade (or membership value)
between 0 and 1. The set U is usually referred to as the universe of discourse.
Figure 2.1: Oven temperature example to compare fuzzy sets and crisp sets.
6
Chapter 2
State of the Art
In the example shown in Figure 2.1 a Gaussian membership function (2.2) is utilized.
Other examples of membership functions are the singleton membership function (2.4),
and the triangular membership function (2.3) among others. A graphical representation
of the Gaussian, triangular and singleton membership functions is shown in Figure 2.2.
2
1 x −c
Gaussian membership function : A ( x, c, ) = exp − (2.2)
2
x−a c− x
Triangular membership function : A ( x, a, b, c) = max min , ,0 (2.3)
b −a c −b
1 if x = b
Singleton membership function : A ( x, b) = (2.4)
0 if x b
Just as in traditional logic and set theory, fuzzy logic utilizes logic operators to
perform a diverse number of operations. Given two fuzzy sets A and B, the fuzzy
intersection and union are as follows:
7
Chapter 2
State of the Art
AB = A ( x) B ( x) (2.5)
where and are known as triangular norms (t-norm) and triangular conorm (t-
conorm or s-norm) operations respectively defined below:
t-norm(a,0) = 0 (2.7)
t-norm(a,1) = a (2.8)
t-conorm(a,0) = a (2.9)
t-conorm(a,1) = 1 (2.10)
Some common t-norm operations are the minimum t-norm (2.11), and the product
t-norm (2.12). Some common s-norm operations are the maximum s-norm (2.13) and
the probabilistic sum (2.14).
8
Chapter 2
State of the Art
Fuzzy rules are logical statements composed of linguistic variables of the following
form:
if x is AThen y is B (2.15)
A→B = A ( x) B ( x) (2.16)
where is a t-norm.
FISs are models created to represent the behaviour of a real-world system utilizing
a rule-base composed of the aggregation of fuzzy rules of the form (2.15) [11]. FISs
are known to be universal approximators, capable of approximating any continuous
function within a level of accuracy [12]. Additionally, FISs are transparent and
interpretable due to its intuitive linguistic modelling. This makes them useful for
modelling, representing and extracting knowledge.
The two main FISs types are Mamdani and Takagi-Sugeno-Kang (TSK). The
Mamdani FIS utilize linguistic variables for both the premise and the consequences of
the rule-base. TSK FISs, utilize linguistic variables for its premises but consequences
are expressed utilizing a function, usually, a linear regression model [13].
9
Chapter 2
State of the Art
The first layer fuzzifies a crisp input utilizing a fuzzy membership function.
Or1, p = r , p ( xp ) (2.17)
The second layer calculates the rule firing strength of each rule according to the logic
operation stated in the rule-base, for an “And” logical operator a t-norm ( ) function
is selected, in the case of the “Or” operator an s-norm ( ) function is utilized:
The third layer is the inference layer which is calculated utilizing a t-norm function.
10
Chapter 2
State of the Art
The fourth layer aggregates the output of the fourth layer utilizing an s-norm
ˆ
O 4 = y Q = y1Q y2Q yRQ−1 yRQ (2.20)
For the final layer, it is necessary to defuzzified the output of the fifth layer. Several
functions have been proposed. The one explored in this work is the center of gravity
(COG) defuzzification which is as follows:
k y (k )i i
Qˆ
yˆ = i =1
N
(2.21)
y (k )
i =1
i
Qˆ
where ki is variable with strictly increasing values within the specified range.
The TSK FIS was designed to model the dynamical behaviour of systems [14],
utilizing an ensemble of local linear models. The premise of the rule-base creates a
partition in the feature space, where each rule represents a local linear model of the
described system. The soft boundaries between the rules allows to model a smooth
transition between each of the local linear models in order to create an accurate and
interpretable non-linear model [13]. An example of a rule-base system of TSK is shown
in Table 2.2.
If x1 is A1,R And/Or … xP is AP, R Then yR = f R (x) = x1b1R + x1b2R + ... + xpbpR + b0R
11
Chapter 2
State of the Art
The overall structure of the TSK rule-base is very similar to that of the Mamdani
FIS. The stages of inference in a TSK FIS are: fuzzification, rule firing strength,
inference and rule aggregation. The consequences of the TSK are linear functions
therefore the output of each rule is a crisp quantity and does not require de-fuzzification.
The TSK FIS can be described as a 5-layered system as follows:
The first layer fuzzifies a crisp input utilizing a fuzzy membership function.
Or1, p = r , p ( xp ) (2.22)
The second layer calculates the rule firing strength of each rule according to the logic
operation stated in the rule-base, for an “And” and “Or” logical operator a t-norm ( )
and a s-norm ( ) functions are selected respectively.
R
wr
Or3 = wr = R
(2.24)
r =1
w
r =1
r
The fourth layer performs the rule inference operation utilizing a t-norm.
12
Chapter 2
State of the Art
The final layer aggregates each of the inferred rules utilizing an s-norm. Because the
linear function utilized as the output of the rules in the TSK FIS, it is not necessary to
perform a defuzzification operation.
O5 = yˆ = y1 y2 yR−1 yR (2.26)
The lack of linguistic variable in the consequences of the rule-base cause the TSK
FIS to be less interpretable than Mamdani FIS. The loss in interpretability is
compensated by an increase in prediction accuracy and a reduction in computational
time.
The FISs rule-bases explored so far are composed of a series of statements connected
with AND-OR operations. Single input FISs rules are composed of a single premise per
rule. These systems can describe the individual effect of a feature to the output. Two
common single-input FIS are the Single Input Rule Modules (SIRM’s) Connected
Fuzzy Inference Model [15] and the Single Input Connected (SIC) fuzzy inference
method [16].
The SIRM’s Connected Fuzzy Inference Model was proposed in [15] to solve the
problem of combinatorial rule explosion by creating rules composed of a single premise
and a single consequent. Given P features and sp partitions per feature, the SIRMs rule-
base is as follows:
13
Chapter 2
State of the Art
SIRM1, S p = if x1 =A1, S p then y1,S p =b1, S p
SIRM2,1 = if x2 =A2,1 then y2,1 =b2,1
SIRM P , S p = if xP =A P , S p then yP ,S P =bP , S p
The inference process of the SIRM is as follows. Each feature p is partitioned into
s p partitions, the membership function of each partition is calculated utilizing a
selected membership function, from the rule-base in Table 2.3 this membership
function is as follows:
p , s = Ap , s ( x p )
p p
(2.27)
The inference of each feature is then calculated utilizing the normalized rule strength
of the feature partitions as follows:
Sp
b
s p =1
p,s p p,s p
yp = Sp
(2.28)
s p =1
p,s p
The final output of the system is calculated as the weighted sum of the features
inferences, the weight parameter wp is selected to give the relative importance of each
P
f ( x) = w p y p (2.29)
p =1
14
Chapter 2
State of the Art
For the SIC fuzzy inference method utilizes the same rule-base described in Table
2.3, the main difference lies with the system output and inference process, instead of
utilizing a weighted sum, it utilizes the normalized rule strength of the feature partitions
and the features, the system output can be modelled as follows:
P Sp
b
p =1 s p =1
p,s p p,s p
f ( x) = P Sp
(2.30)
p =1 s p =1
p,s p
The simple structure of the SIRM and the SIC fuzzy inference methods are
computationally efficient given the low number of operations.
The rule-base which composes a FIS can be created utilizing different methods. The
utilization of expert knowledge to derive a FIS is the earliest example of rule-base
elicitation [6]. The rule-base is created based on the expert knowledge of a process.
With simpler process these rule-bases can create accurate and reliable models.
Nowadays is more common to develop rule-bases automatically utilizing a dataset or
an information system containing the relevant information required to model a system
[13]. Some of the most common methods are grid-partition and cluster base methods.
Grid partition methods are among the earlier FIS automatic rule-base elicitation
methods. The features and outputs are divided into partitions creating a grid. The rule-
base is composed of a combination of every feature partition and output. These number
of rules grows exponentially with the addition of features and partitions, creating what
is known as combinatorial rule explosion [17]. An example of a two-dimensional
partition is shown in Figure 2.3. To solve the problem of combinatorial rule explosion,
15
Chapter 2
State of the Art
different techniques have been developed; most commonly rule-bases are developed
from data clusters or granules that produce more accurate and compact models.
Cluster base methods utilize input and output data from a system or process to
identify patterns, each cluster results in the formation of a rule, the size and shape of
the membership function are calculated based on the geometry of each one of the
clusters or granules obtained from the data, an example of cluster-based rule elicitation
is shown in Figure 2.4. Two commonly used clustering algorithms are the Fuzzy C-
Means (FCM) clustering algorithm [18]–[23] and the subtractive clustering algorithm
[24]. Other alternatives to create initial rule-base based on the input/output information
is the utilization of information granulation algorithms[25], [26] and hierarchical
clustering [27].
16
Chapter 2
State of the Art
Eliciting a rule-base utilizing any of the methods previously described does not
necessary guarantee an optimal performance of the FIS. In order to improve the
performance, it is required to perform a “fine tuning” on the system parameters, such
as changing the shape and position of the membership functions. A manual tuning of
these rules may become intractable as the complexity increases. In order to tune
automatically the parameters of a FIS it is necessary to either utilize global optimization
methods such as genetic algorithms (GA) [28] or to implement learning techniques
utilized in ANN defined as neuro-FIS [29].
K P
y ( x, w) = wk2k w1kp x p + w01 + w02 (2.31)
k =1
p =1
17
Chapter 2
State of the Art
where P represents the number of features, K the number of neurons in the hidden
layer, represents the activation function, a common activation function is the
sigmoidal function (2.32). The W parameters are called the weights of the ANN, and
the w0 ’s are defined as the bias. These W parameters are usually calculated utilizing a
1
(a) = (2.32)
1 + e( − a )
18
Chapter 2
State of the Art
the performance based on an objective function. The objective function utilized in the
error-backpropagation is the sum of squared errors:
1
E= ( yˆ − y )
2
(2.33)
2
where ŷ is the estimated output of a model and y is the real output. The weights
and biases of the ANN are updated according to:
wt +1 = wt −E(wt ) (2.34)
where w is the vector containing the weights of the ANN, E(w) is the gradient
of the objective function with respect of the weights and is the step size.
E E E
E (w t ) = (2.35)
w1 w2 wN
Radial basis function networks (RBFN) are a type of ANN with a single hidden layer
and the selected activation function is a Gaussian (2.36). While activation functions,
such as the sigmoidal function, are supposed to activate the neuron once a threshold is
met, RBFN respond to inputs located in certain regions in the feature space.
2
1 xi − ci
i = exp − (2.36)
2 i
19
Chapter 2
State of the Art
Figure 2.6: Single output RBFN a) weighted sum output and b) weighted average
output.
The output of the RBFN can be either a weighted sum (2.37) (Figure 2.6 (a)) or a
weighted average (2.38) (Figure 2.6 (b)). The similarities between the weighted average
RBFN and the Mamdani FIS are evident, in the following section it will be
demonstrated that both can be functional equivalent given certain conditions.
R R
f (xi ) = brr (xi ) = br wr (2.37)
r =1 r =1
R
brr (xi ) R
br wr
f ( xi ) = R
= R
(2.38)
r =1
(x )
i =1
r i
i =1
w
i =1
r
The Mamdani FIS can be functionally equivalent to RBFN under certain conditions
[34][35]. The first condition is the selection of a fuzzy Gaussian membership function
for the premises. The second condition is to select the algebraic product as the t-norm
operation for the calculation the rule firing strength and the implication. The third
condition is to aggregate the rules utilizing an algebraic sum operation. Finally by
20
Chapter 2
State of the Art
selecting a singleton membership function (2.4) for the consequents of the rules and
selecting the COG defuzzification method results in a function equivalent to the
weighted average of the RBFN activation functions outputs (2.38). It is important to
note that the algebraic-sum is not an s-norm, such modification result in greater
computationally efficiency [10] and in a functional equivalence to the RBFN.
The Mamdani FIS with singleton defuzzification can be described as a four layered
system. The first layer fuzzifies the input (2.39), the second layer calculates the rule
firing strength (2.40), the fourth layer calculates the inference (2.41). The final layer
defuzzified the input utilizing the COG method (2.42).
1 x − c 2
O 1
= r , p = exp − r , p r , p (2.39)
2 r , p
r, p
P
O = wr = r , p
2
r (2.40)
p =1
Or3 = wr br (2.41)
R
br wr
Or4 = R
(2.42)
r =1
w
r =1
r
membership function. The partial derivatives of the objective function (2.33) with
respect to the b, and c parameters are as follows:
E w (x)
= ( yˆ − y ) R r (2.43)
b r
wr (x) r =1
21
Chapter 2
State of the Art
( x − c )2
E br − yˆ
= ( yˆ − y ) R wr ( x)
p r, p
(2.44)
r , p ( )3
wr (x)
r =1
r, p
x −c
E br − yˆ
= ( yˆ − y ) R wr ( x) p r ,2p (2.45)
cr , p ( )
wr (x)
r =1
r, p
The ANFIS model is based on the TSK FIS [10]. Rules are composed of premises
whose membership function usually is selected to be a Gaussian, it utilizes the product
t-norm for the conjunction and implication operations and utilizes the algebraic sum for
aggregating rules.
The ANFIS model can be described as a five-layered system as shown in Figure 2.7.
The first layer fuzzifies the input (2.46), the second layer calculates the firing strength
(2.47). The third layer performs a rule normalization operation (2.48). The fourth layer
calculates the inference (2.49). The fifth layer aggregates the rules with the algebraic
sum operation (2.50).
1 x − c 2
O 1
= r , p = exp − r , p r , p (2.46)
2 r , p
r, p
P
Or2 = wr = r , p (2.47)
p =1
wr
Or3 = wr = R
(2.48)
w
r =1
r
22
Chapter 2
State of the Art
What differentiates the ANFIS from the TSK model is the application of a hybrid
learning method. The premises parameters ( ,c ) are optimized utilizing the
b1,0
b
1,1
w1 w1 x11 w1 x1P w2 w2 x11 wR x1P
2
w1 w1 x1 w1 xP2 w2 w2 x12 wR xP2 b1, P
b2,0 (2.51)
w1 w1 x1 wR xPN b2,1
1
w1 xPN w2 w2 x1N
bR , P
b
b* = ( T ) T y
−1
(2.52)
23
Chapter 2
State of the Art
where is called the design matrix, and N represents the number of instances in
the dataset. The hybrid optimization algorithm alternates the training at each step of the
premise parameters σ and c ,and the consequent parameters b .
24
Chapter 2
State of the Art
(a) (b)
Figure 2.9: (a) Interval type-2 Gaussian membership function. (b) Interval type-2
Gaussian membership function and
Type-2 and interval type-2 fuzzy inference systems have been applied to a wide
range of fields, including control [38], healthcare [39], and metallurgy [40].
Rough sets were developed by Pawlak in [41] to model vagueness and uncertainty.
A rough set is composed of two approximations: a lower approximation that contains
all the objects that certainly belong to a class and an upper approximation that contains
all the objects that may or may not belong to a class. An example of an information
table is shown in Table 2.4.
25
Chapter 2
State of the Art
Therefore, two objects are indiscernible if they contain the same feature values, for
the features in D. For example in the information system shown in Table 2.4 the
indiscernible objects of the following subsets D1 = {Feature1, Feature 2, Feature 3} ,
PX = {x |[ x]p X } (2.57)
PX = {x | [ x] p X } (2.58)
The positive, negative and boundary regions of a rough set given two sets of
attributes P and Q are as follows:
26
Chapter 2
State of the Art
The positive region contains all the objects of U that can be classified to a class
U / Q given the information contained in the attributes P. The boundary region contains
the set of objects that can’t be classified with absolute certainty, and the negative region
contains the objects that certainly cannot be classified. In the example shown in Table
2.4, the positive regions of D1, D2 and D3 given Q = Output are as follows:
From the Positive region of D1 instances {1,2} do not form part of any class, the
reason for this is the conflict in the output Q, it is not possible to determine whether a
such feature values would determine a precise output, therefore instances {1,2} are
considered inconsistent. From D2 it is seen a decrease in the size of the sets. That is
because the removal of features, especially Feature 3, makes it impossible to discern
between objects and to classify the output appropriately. Additionally, it is seen from
the results, that D3 contains the same number of objects in its positive region as D1. The
feature dependency can be measured as follows:
POS D (Q)
D (Q) = (2.65)
U
27
Chapter 2
State of the Art
The feature dependency is a measure of how well a set of features can describe the
output. For the subsets D1, D2 and D3 from the example of Table 2.4 the feature
dependency is the following:
{3, 4,5, 6}
D (Q) = = 0.6666 (2.66)
1
{1, 2,3, 4,5, 6}
{5}
D (Q) = = 0.1666 (2.67)
2
{1, 2,3, 4,5, 6}
{3, 4,5, 6}
D (Q) = = 0.6666 (2.68)
3
{1, 2,3, 4,5, 6}
Rough sets were applied for a diverse number of applications such as knowledge
discovery [42] and clustering [43]. Where rough sets have been most successfully
applied has been in the development of feature selection algorithms [44]–[47]. Rough
sets suffer from the limitation of being only applicable to qualitative datasets, thus
limiting its applicability considerably given that most real-world datasets are composed
of mixed valued data. To solve this problem, the development of fuzzy-rough sets
28
Chapter 2
State of the Art
hybrids were developed [9]. Fuzzy rough sets are capable of modelling mixed datasets
given the continuous degree of membership of fuzzy sets.
The fuzzy rough sets hybrids were initially proposed by Dubois and Prade in [9], the
method consists of developing fuzzy partitions in the dataset. The fuzzy rough lower
and upper approximations are estimated as follows:
follows:
R X ( x) = inf I ( R ( x, y ), X (y)) (2.71)
P yU P
objects x and y for a feature p. Jensen and Chen [49] proposed the application of the
29
Chapter 2
State of the Art
Łukasiewicz t-norm (2.74) , the Łukasiewicz implicator (2.75) and proposed the
following fuzzy similarity relations (2.76)-(2.78).
( p( x) − p( y)) 2
R ( x, y) = exp − (2.76)
p 2 2
p
p( x) − p( y )
R ( x, y ) = 1 − (2.77)
p
pmax − pmin
p ( y ) − ( p ( x) − p ) ( p ( x) + p ) + p ( y )
R p ( x, y ) = max min , (2.78)
p ( x) − ( p ( x) − p ) ( p ( x) + p ) + p ( x)
The positive region and feature dependency of a fuzzy-rough sets are calculated as
follows:
p (Q) =
xU
POS RP ( Q )
( x)
(2.80)
U
Rough set theory and fuzzy rough set theory have been implemented successfully in
different fields such as in pattern recognition [45], attribute selection [44], [45], [47],
[49]–[52], rule induction [53], classification [47], [54] and knowledge discovery [42],
[47].
30
Chapter 2
State of the Art
CFS theory was first developed by Ramot et al. [8] [55]. A CFS S in a universe of
discourse U is defined as follows:
S ( x) = rS e j ( x)
S
(2.81)
where j = −1 , rS and S are the magnitude and the phase of the CFS respectively.
While traditional type-1 fuzzy sets lie within the interval [0,1] the CFS lies within a
unit circle. The magnitude rS represents a type-1 fuzzy sets and the phase S is a
relative quantity that assigns the “context”. This makes the type-1 fuzzy set a special
case of the CFS when all phases are equal to zero.
According to [8] [55], the magnitude and the phase of the CFS are two separate
identities, and therefore the operations applied to one should not affect the other. In the
case for the complex fuzzy union and intersection, given two complex membership
functions A and B, the resultant membership function of the union operation A B and
intersection operation A B is given as follows:
where represents any t-conorm function and represents any t-norm function.
The following equations (2.84)-(2.90) are proposed for both the union and intersection
of the phase [8], [55]:
31
Chapter 2
State of the Art
AB = A + B (2.84)
AB = A − B (2.87)
r r
A B = A A B (2.88)
B rB rA
rA A + rB B
AB = (2.89)
rA + rB
A + B
A B = (2.90)
2
The characteristic operator of the CFS is the complex fuzzy aggregator which is also
called vector aggregation [8] [55]. CFSs are composed of a magnitude and a phase,
therefore CFSs exhibit “wave-like” properties, when two or more CFS are aggregated
the magnitude of the resultant vector will depend on the phase alignment of the CFSs.
The definition of the complex fuzzy aggregation [55] is as follows:
Definition 4 [55]: Let A1, A2 ,..., An be CFS defined on the universe of discourse U.
n
v : a a , a 1 → b b , b 1 (2.91)
A ( x) = v ( A ( x), A ( x),..., A ( x) ) = wi A
n
1 2 n i
(2.92)
i =1
32
Chapter 2
State of the Art
With wi a a , a 1 for all i, and
n
i =1
wi = 1 .
For the implication operator, the proposed function is the algebraic product (2.93).
A→B ( x, y) = A ( x) B ( y) (2.93)
where:
rA→B ( x, y) = rA ( x) rB ( y) (2.94)
A→B ( x, y) = A ( x) + B ( y) (2.95)
The magnitude and the phase of the CFS proposed in [8] and [55] have separate
identities, and the operations performed on one should not have an effect on the other.
Dick [56] defines this CFS as one “with rotational invariance”. A rotational invariant
CFS has several limitations, and most importantly, Dick demonstrates that “the
algebraic product cannot be used as a conjunction operation” [57]. in a rotational
invariant CFS, even though Ramot et al. utilizes the product function as implication
[56], [57]. To resolve these limitations Dick proposes a CFS “without rotational
invariance” based on vector logic, where the magnitude and phase are not separate
identities. Dick proves that in a CFS without rotational invariance the algebraic product
can be used as a conjunction operation [56], [57].
33
Chapter 2
State of the Art
Tamir [58] expands the original idea of CFS devised by Ramot et al. [8] [55] and
proposes a “pure CFS”. The rotational invariant CFS only conveys the fuzzy
information in the magnitude, in a pure CFS both the magnitude and the phase convey
the fuzzy information; the pure CFS can be alternatively represented in rectangular
form. In a pure CFS either the real or the imaginary part (alternatively the magnitude
and the phase) represents a fuzzy set, while the other represents a fuzzy class. Fuzzy
classes [59] are sets of fuzzy sets, therefore a pure CFS represents the membership of
an object in a fuzzy class and a fuzzy set.
The field of CFS and logic is relatively new, and more research and applications are
being developed. With that a whole new development of different CFS, including those
based in Atanassov intuitionistic fuzzy sets [60], which include the Pythagorean fuzzy
sets [61], and the complex intuitionistic fuzzy sets [62]. Complex neutrosophic sets
have also been proposed [63].
In [64] the authors make a comparison between the CFS and a type-2 fuzzy sets,
among their conclusions, it is of importance to denote the following:
1) The CFS conveys an extra dimension of information while a type-2 fuzzy set is
used to represent uncertainty.
2) In 3 dimensions a type-2 fuzzy sets represent a surface while the CFS represents
a trajectory.
34
Chapter 2
State of the Art
Additional work on type-2 and interval valued complex fuzzy sets can also be found
in [65]–[67]. A comprehensive review of the state of the art of CFS can be found in
[57]
Complex fuzzy inference systems (CFISs) are a set of FIS based on the CFS with
rotational invariance proposed by Ramot et al. in [8] and the CFS without rotational
invariance proposed by Dick in [56]. These CFISs are not to be confused with complex
valued fuzzy inference systems which are not based on CFS but are based on either
complex fuzzy numbers or the application of complex valued information in the FIS
[68]–[72]. The CFISs developed so far are the Adaptive Neuro Fuzzy Complex
Inference System (ANCFIS) [73], the Complex Neuro Fuzzy System (CNFS) [74], and
the Adaptive Complex Neuro Fuzzy Inferential System (ACNFIS) [75].
The first CFIS developed was the ANCFIS [73]. The ANCFIS is a six-layered
system (Figure 2.11) based on the ANFIS architecture [76] designed specifically to
model time series data utilizing CFSs without rotational invariance [56]. Compared
with most FIS the ANCFIS utilizes a sinusoidal membership function; It is known from
the Fourier theorem that any periodic function can be approximated with a series of
sums of sines and cosines, therefore it is proposed in [73] a sinusoidal membership
function to capture the most important frequencies and model the approximate periodic
behaviour of an input window.
rs ( ) = d sin(a + b) + c (2.96)
35
Chapter 2
State of the Art
where, r and are the magnitude and phase of the CFS respectively, the parameters
a, b, c and d modify the frequency, phase shift, vertical shift and amplitude respectively.
A1 DP
A DP
ayer 1 ayer ayer ayer 4 ayer 5 ayer
The first layer of the ANCFIS convolves an input window time series dataset. The
second layer calculates the firing strength of the rules utilizing the algebraic product.
The third layer normalizes the firing strength, during normalization only the magnitudes
of the CFSs are normalized and the phases are left unchanged. The fourth layer is an
additional layer not present in the ANFIS model, called the rule interference layer,
instead of utilizing the vector aggregation proposed by Ramot et al., the interferences
are created by applying a dot product between the rules; the output of the fourth layer
is a real valued scalar. The fifth layer calculates the consequent parameters and
multiplies the output of the fourth layer. The sixth layer is the output layer where the
scalar output of each rule are summed.
The ANCFIS model utilizes an input window instead of delayed inputs; this reduces
the number of rules to the number of input windows creating a compact FIS. The
parameters are optimized utilizing a hybrid optimization algorithm, for the forward pass
a least squares algorithm is used to update the consequences, the backward pass utilizes
a combination of complex back propagation [77] and derivative free optimization to
update the premise parameters.
36
Chapter 2
State of the Art
Variations on the ANCFIS input type, architecture and operations have been
explored throughout its development and the author encourages the reader to research
the work done for the ANCFIS model.
The ANCFIS model has been applied to different datasets: The Wolfer sunspot
numbers [73], [78], [79], the Mackey-Glass 17 [73], [78], the Santa Fe laser dataset
[73], [78], stellar brightness [78], wave heights [78], [79], Photovoltaic power dataset
[80]. The ANCFIS has also been implemented successfully in modelling multivariate
time series, such as a Motel monthly occupancy [81] [82], Flour monthly price[81],
[82] Monthly precipitation in different areas in Tennessee [81], [82], and NASDAQ
[82]. A variation on the training algorithm to incorporate extreme learning machines
was applied to four different software reliability growth datasets [83] . The reported
results obtained from the ANCFIS are comparable with other models while maintaining
a compact model, utilizing in some circumstances fewer than 3 rules to model complex
datasets and chaotic time series.
The CNFS is based on the ANFIS architecture [76] and CFSs with rotational
invariance [8], the system utilizes a complex Gaussian membership function. A hybrid
learning algorithm is applied for the training which consists of a least squares algorithm
for consequences and a derivative free optimization algorithm for the premises. The
model output is a complex number with a real and imaginary part, defined as the dual
output property. The real part is generally used as the final output of the system, with
the dual output property is explored in [84] and [85].
Two different types of complex Gaussian membership function are utilized. Initially
in [86]–[88] the membership function used is the Gaussian membership function
represented in rectangular form:
37
Chapter 2
State of the Art
In subsequent papers [84], [85], [89], the complex Gaussian membership function is
modified to add a term called the frequency factor which multiplies the phase of the
membership function, and the polar representation is utilized:
x−c
2
h − m h − m
2
The first layer of the CNFS calculates the value of the complex membership utilizing
either (2.97) or (2.98). The second layer calculates the firing strengths according to
(2.83) utilizing the product operation of the magnitudes and the addition operation for
the phases (2.84). The third layer normalizes the whole complex number. The fourth
layer calculates the linear consequences and multiplies the normalized weights from the
third layer. The fifth layer calculates the output by summing the signals of the network,
the real part is used as the final output. The imaginary part can also be used as an output
in certain circumstances.
CNFSs have been applied for function approximation [74], noise cancelling [86],
time series prediction [87], [89], knowledge discovery [88]. The dual output property
is explored in [84] for financial purposes to calculate both the opening and closings of
38
Chapter 2
State of the Art
the NASDAQ and in another instance to calculate simultaneously the TAIEX index and
the Dow Jones with the real and the imaginary part of the complex output.
2.6.4.3 The Adaptive Complex Neuro–Fuzzy Inferential System
The ACNFIS [75] is a 5 layer FIS (Figure 2.12) based on the ANFIS model [76] and
utilized a CFS with rotational invariance [8]. The ACNFIS utilizes two Gaussian
functions as the magnitude and phase membership function, because “a complex valued
function cannot be both analytical and bounded unless is a constant” [75], the complex
membership function utilizes two real valued functions to bound the complex
membership within the unit circle. The complex membership function is as follows:
x−c
2
x−c
2
( x) = exp −
Aj
2 exp − Pj
(2.101)
aA aP
j j
A11
A
1
A1
A
ayer 1 ayer ayer ayer 4 ayer 5
39
Chapter 2
State of the Art
The first layer of the system calculates the complex membership function according
to (2.101). The second layer calculates the firing strengths according to (2.83) utilizing
the product operation of the magnitudes and the addition operation for the phases (2.84)
, utilizing the product operation of the magnitudes and the addition operation for the
phases. The third layer normalizes the magnitude of the complex number. The fourth
layer calculates the linear consequences, in the ACNFIS two linear consequences per
rule are calculated, one for the real part and one for the imaginary part. The real part is
utilized as the final output. The system utilizes a Levenberg-Marquardt (LM)
optimization algorithm for training.
For Lipton [91] the interpretability of a model can be composed of two main
properties, transparency and post-hoc explanations. Transparency is the property of a
model to explain how a model works, by its entirety and its individual components.
Post-hoc explanations relate to the representation of information to extract knowledge
about a process.
40
Chapter 2
State of the Art
The main advantage of utilizing FIS over others is the interpretability and
transparency that fuzzy logic provides. Good performance and generalization properties
have been shown, with the additional advantage already explained in previous section
of soft boundaries. There exists no clear mathematical definition of interpretability and
transparency, regardless a few guidelines [93], and measurements [7], [93]–[96] can be
taken into consideration to better develop interpretable and transparent FISs.
The first quadrant relates to the number of rules and the number of conditions per
rule. Maintaining a parsimonious model is essential to be interpretable. It is known in
psychology that humans struggle processing more than seven information objects. In
[97] the number of information objects that a human can process was found to be 7 2
. Therefore, it is important to maintain rule-base systems with no more than 9 premises
per rule [93].
The second quadrant relates to the number of features and the number of
membership functions per feature. The limit of humans to process information was
mentioned in the previous paragraph [93].
41
Chapter 2
State of the Art
The third quadrant relates to the consistency of a rule-base, and the number of rules
fired at the same time. A rule-base is considered consistent when there are no
contradictory rules [93].
For TSK FISs interpretability is considerably reduced given that the consequents of
the rule-base are composed of linear regression models and not linguistic variables. The
TSK FIS is a local linear model, linear regression models are transparent, given that it
is possible to assess the impact of each feature on the output, these same properties
allows to for the models to be interpretable to some extent. Therefore, a TSK FIS can
be locally interpretable. In order to maintain the interpretability of the TSK FIS some
authors have developed learning algorithms to maintain a local -global performance
[98], [99].
42
Chapter 2
State of the Art
2.8 Summary
Fuzzy sets and logic were developed to model the complexity and vagueness of
human natural language [5]. Fuzzy statements are arranged in the form of if-then rules,
capable of modelling natural phenomena intuitively. This arrangement of if-then rules
is defined as a FIS. The two main type of FIS are Mamdani [6] and TSK [14]; Mamdani
FIS are more interpretable given it only utilizes linguistic variables to form its rule-
base. TSK are more accurate, given its consequences are composed of linear functions.
To model different phenomena several expansions to the type-1 fuzzy set have been
developed, these include, fuzzy rough sets and CFS. Rough sets are composed of two
approximations to represent the possible membership of an object, fuzzy rough sets
expand the applicability of rough sets to add vagueness and soft boundaries to
membership values. CFS add context and time to linguistic variables.
So far only three CFISs have been developed to date, these are the ANCFIS [73],
CNFIS [74] and ACNFIS [75]. Results obtained from these CFISs are comparable with
other known FISs such as RBFN and ANFIS. The ANCFIS was designed for time series
43
Chapter 2
State of the Art
prediction, compared with most FIS, the ANCFIS utilize a sinusoidal membership
function, the rule interference is performed by a dot product operation. Both the CNFIS
and ACNFIS neglect, for the most part, the effect and meaning of the imaginary
component of the CFS, furthermore neglecting the effect of the rule interference
operation. None of the CFISs developed to date address the problem of interpretability.
44
Chapter 3
Selected Datasets for Algorithms Validation
Chapter 3
Selected Datasets for Algorithms Validation
The models elicited in this work will utilize four real world datasets. The first two
are industrial datasets obtained from material testing. The third is a dataset obtained
from a clinical study. The fourth is publicly available dataset.
Metallurgy is a branch of material sciences that studies the behaviour metals. The
field of metallurgy is divided into two main branches, ferrous-metallurgy and
nonferrous metallurgy. Ferrous metals are those metals whose main alloying element
is iron. Among them, one of the most important alloys is steel whose main components
are iron and carbon [100].
Metals are composed of microscopic crystal grains. Crystals are classified according
to the arrangement of the atoms composing them. Iron, the main component of steel,
can take three different structures, ferrite, austenite or martensite. The macrostructural
properties of steel rely on the microscopic structure and arrangement of these crystals.
The production, treatments and addition of alloying elements to steel change the
structure and arrangements of the crystals changing its properties [100].
The Charpy impact test is used to measure the fracture energy absorbed by a
material. A sample is placed in the Charpy impact test machine where a pendulum
strikes the sample and fractures it, registering the loss of potential energy of the
45
Chapter 3
Selected Datasets for Algorithms Validation
pendulum as the energy absorbed by the material [101]. To facilitate a fracture, samples
are machined to add a notch which creates a triaxial state of stress in the centre of the
sample [101]. The resistance to fracture is called “notch toughness” [101]. Fractures
can be classified as ductile or brittle, ductile fractures are associated with a higher
absorption of energy compared with brittle ones [102].
The Charpy impact test presents a difficulties for modelling mainly due to the scatter
in measurements [104] and the amount of inconsistencies [105], inconsistencies are
46
Chapter 3
Selected Datasets for Algorithms Validation
related to samples with the same or similar feature parameters and different outputs.
The inconsistencies present in the dataset are attributed to features not measured in the
dataset. Features, such as grain size and other micro-scale material properties, are time
consuming and/or expensive to measure [106] and therefore it is not uncommon for
these variables not to be found in the datasets.
The Charpy impact dataset utilized in this work consist of 1661 records, 16 features
and one output which corresponds to the measured Charpy impact energy, a summary
of the dataset information is shown in Table 3.1. Additionally, a partial correlation plot
is shown in Figure 3.2.
47
Chapter 3
Selected Datasets for Algorithms Validation
The addition of other alloys and process are harder to measure and quantify, some
alloys such Chromium and Nickel are added to a material to increase its resistance to
corrosion, and therefore it is important to understand the relationship to perform a cost-
benefit analysis or a trade-off between different desired material properties.
The UTS is common measure of a material strength. In order to measure the UTS a
sample is placed in a tensile test machine which applies a load at a constant speed, the
deformation and required force is measured and the data is used to obtain stress-strain
48
Chapter 3
Selected Datasets for Algorithms Validation
curves. The UTS is defined as the maximum engineering stress and corresponds to the
maximum stress measured in a stress-strain curve [101].
The UTS dataset consists of 3760 records, 15 features and one output which
correspond to the UTS value. The characteristics of the dataset are shown in Table 3.2.
Additionally 12 data points are used for validation, these 12 data points are outliers and
therefore used to validate the generalization properties of a model [40].
The partial correlation plot is shown in Figure 3.3. From the partial correlation plot,
similarly to the Charpy impact test, only a few conclusions can be drawn given the
nonlinear relationship between alloying elements, process and material properties. It is
well known that while the content of carbon increases its brittleness, it does increase its
strength as well. Tempering increases the ductility of a carbon steel while decreasing
strength.
49
Chapter 3
Selected Datasets for Algorithms Validation
Patients diagnosed with cancer are often given an estimate of the risk of
death/relapse from the disease. The risk estimation is based on the lifetime expectancy
after the diagnose, a common practice is to classify as high risk of mortality patients
whose death may occur within the next 5 years, and low risk those patients whose life
expectancy is superior to 5 years [40].
50
Chapter 3
Selected Datasets for Algorithms Validation
censored” [107]. An example of the records and censoring is shown in Figure 3.4. The
branch of statistics that studies time-to-event data is called survival analysis.
The dataset consists of the records obtained from 2918 patients who suffer from
bladder cancer; the dataset contains 16 features and 1 output, which corresponds to time
of death or last observed time. Out of the 2918 patients records, 613 are marked as non-
censored. The dataset used in this work consists of the non-censored records as well as
those right censored records whose last-observed time surpassed the threshold of 60
months. The resulting dataset consists of the records of 1581 patients. A summary of
the dataset is shown in Table 3.3.
Figure 3.4: [39] Illustration of right censoring: Patients A and B, outlived the study,
Patient C was lost due to an unrelated event, patient E withdrew from the study. The
records of patient A and F are the only ones not censored as the time of death from the
event of interest occurred within the duration of the study. The recorded time is equal
to the observed time only. In this example patient C last observed time is 20 months,
as the observation period begun at 20th month and was lost at the 40th month.
Patients whose last observed time is superior to 0 months are labelled as “1”, while
non-censored patients whose last observed time is below the 60-month threshold are
51
Chapter 3
Selected Datasets for Algorithms Validation
labelled as “0”. This is a simple solution that does not require application of survival
analysis methods [108], which are out of the scope of this work. The dataset will be
utilized as a least square problem.
3.5 Superconductivity
Superconductors are materials known to have near zero resistance when their
temperature is bellow a critical temperature [109]. The superconductivity dataset
consists of 21263 instances, 80 features and 1 output which corresponds to the critical
temperature of such semiconductors [109], [110].
52
Chapter 3
Selected Datasets for Algorithms Validation
3.6 Summary
A brief overview of the datasets explored on this work has been presented, each
dataset present different challenges. The partial correlation plots for the Charpy impact
test and UTS were able to describe, certain behaviour that has been well understood in
material science. It is clear the limitations of utilizing linear statistical methods for
knowledge extraction.
The Bladder cancer dataset present difficulties given the number of censored data
present in clinical studies. A modelling approach is presented that do not require the
application of statistical survival analysis tools, allowing to model the dataset utilizing
a least squares algorithm.
The super conductivity dataset contains a large number of features and instances,
therefore the results obtained would validate the application of the developed
algorithms for large datasets.
Given the known difficulties of modelling the Charpy impact test dataset, this set
will be analysed and tested in greater detail in comparison with the other datasets to
demonstrate the capabilities of the models and tools developed.
53
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Chapter 4
The Single Input Complex Fuzzy Inference System
Model
4.1 Introduction
Complex Fuzzy Logic (CFL) and CFSs expand the traditional type-1 fuzzy sets and
logic to the unit circle. CFS and logic was initially developed by Ramot et al. who
proposed the utilization of CFS to model periodic data [8] [55].
Most of the developed CFIS so far have explored the ability to represent approximate
periodic data with CFS and have produced highly accurate results. Regardless of the
achievements of these CFIS, the problem of interpretability has not been fully addressed
though.
According to Ramot et al. [55] the development of a CFL should retain the properties
of traditional fuzzy logic and benefit from the use of complex numbers; the authors
point out to the following properties: 1) The framework should handle numerical data
and linguistic knowledge. 2) CFL system must remain simple and intuitive. 3) Rules
should be fired in parallel for efficiency [55].
The proposed Single Input Complex Fuzzy Inference System (SICFIS) model was
developed in accordance with these three requirements. In order to create an
interpretable CFIS the structure needs to remain as simple as possible: the SICFIS
model represent a single-feature-partition-per-rule CFIS where the premises are
composed of type-1 fuzzy Gaussian membership functions and the consequences are
complex fuzzy singleton membership functions. This simple structure allows the user
54
Chapter 4
The Single Input Complex Fuzzy Inference System Model
to identify the relationship between features partitions based on the phase difference of
the consequences, additionally the system is capable of handling continuous,
categorical and linguistic data.
The simple structure of the SICFIS presents several advantages: 1. The number of
parameters grows linearly with the number of features in the dataset. 2. The
combinatorial rule explosion problem is avoided. 3. It is not necessary to execute a
clustering algorithm or the assistance of expert knowledge to create an initial rule-base.
Therefore, training time is reduced considerably since the number of operations and
parameters are lower than traditional FISs. Additionally, a parsimonious model should
be able to reduce the probability of overfitting [111].
In this chapter the SICFIS model is tested on three different datasets. The first dataset
is used for the prediction of a Charpy impact test in steel. The second dataset is used
for prediction of the UTS of steel. The third dataset consists on predicting the risk of
mortality for bladder cancer patients. Results obtained from the three different datasets
show an equivalent level of accuracy as RBFN, ANFIS models, simple ANN as well
as other type-1 and type-2 FISs. An interpretability analysis applied to the Charpy
impact test will demonstrate that the knowledge extracted from the model is consistent
with what is known in the literature.
Most of the applications of CFS, as originally proposed in [8], have mainly focused
on modelling datasets which contain approximately periodic data. However, to
illustrate the applicability as well as the advantage of CFSs in generic data modelling
problems, Ramot et al. proposed an application where CFSs are used to predict voter
turnout in an election [55] through the use of the two rules shown in Table 4.1.
55
Chapter 4
The Single Input Complex Fuzzy Inference System Model
According to Ramot et al. while each individual rule when true provides a high and
very high voter turnout, when both of them are true, the voter turnout is in fact Low
[55]. This phenomenon can be easily and elegantly modelled by assigning different
phases to each rule in order to cause a destructive interference. The proposed SICFIS
model expands on this idea to create a compact model capable of modelling the
complex interaction between feature partitions.
A similar model was proposed in [112]. Although the proposed methods are similar,
the authors of [112] fail to provide any results. Additionally the equations presented are
identical as the ones presented in the real-valued SIRM model proposed in [15].
Therefore, due to the lack of results and evidence provided in [112] , the SICFIS model
proposed in this work is the first interpretable CFIS.
56
Chapter 4
The Single Input Complex Fuzzy Inference System Model
4.2.1 The Single Input Complex Fuzzy Inference System Membership Function
The SICFIS model utilizes a real valued Gaussian membership function for the
premises, for a feature p and a partition sp the membership function is as follows:
2
1 x p − c p,s p
p,s = exp − (4.1)
p
2 p,s
p
where c and are the centre and the spread of the Gaussian membership function
respectively.
j p ,s p ( x )
p,s = p,s e
p p
(4.2)
pRe,s = p ,s cos( p ,s )
p p p
(4.3)
pIm,s = p , s sin( p , s ) j
p p p
(4.4)
where represents the magnitude and represents the phase. Equations (4.3) and
(4.4) show rectangular coordinates of the singleton membership function; both
parameters and are real-valued scalars.
4.2.2 The Single Input Complex Fuzzy Inference System Model Architecture
57
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The SICFIS is a 5-layer model as shown in Figure 4.1. The first layer is the
fuzzification layer which assigns a degree of membership to a partition sp of a feature
p, according to:
2
1 x p − c p,s p
O1
= p,s p = exp − (4.5)
p,s p
2 p,s
p
The second layer performs a normalization operation for the sp partitions of a
feature p as follows:
p,s
Op2,s p = Sp
p
(4.6)
s p =1
p,s p
58
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The third layer performs the implication operation. The algebraic product is selected
as the implication operation. The output of the second layer (4.6) multiplies the complex
singleton membership function, (4.2). The rectangular form of the complex singleton
membership function is used in order to facilitate calculations as follows:
p , s p = O p , s p cos( p , s p ) p , s p
3 2
ORe, (4.7)
p , s p = O p , s p sin( p , s p ) p , s p
3 2
OIm, (4.8)
The third layer is the vector aggregation (or rule interference) layer in which the real
and imaginary parts are added respectively as follows:
P Sp
4
ORe = ORe,
3
p,s p (4.9)
p =1 s p =1
P Sp
4
OIm = OIm,
3
p ,s p (4.10)
p =1 s p =1
The fifth layer calculates the magnitude and the phase of the resultant vector as
follows:
O 5 = O 4 arg ( O 4 ) (4.11)
The magnitude of the resultant vector is utilized as the final output of the model to
evaluate its performance; the phase may be used as additional information to improve
the interpretability of the system. Particularly, as it will be demonstrated in this work.
59
Chapter 4
The Single Input Complex Fuzzy Inference System Model
In order to improve the results from the optimization it is important to select a valid
initial model since a randomly or an inadequately initialized model is more likely to
drive the optimization algorithm into a sub-optimal solution. The initialization of the
model works as follows: for the premises a grid partition of the data is performed, each
feature p will be divided into sp partitions (Figure 4.3) , each partition will have a centre
and Standard Deviation (SD) as is recommended in [113], where the membership
values are continuous and the partition intersect at approximately 0.5 membership value
as shown in Figure 4.2 a. For the complex consequences a phase is assigned to each
membership function, with the values of the phases being linearly spaced between 0
and π as shown in Figure 4.2 b. The initial values β are obtained from the coefficients
of a partial correlation (PC) analysis as follows:
N N
N X , i Y , i − X , i Y ,i
PC = i =1 i =1
(4.12)
2 2
N
N
N
N
N X2 ,i − X ,i N Y2,i − Y ,i
i =1 i −1 i =1 i −1
where are the residuals obtained from a linear regression and X,Y are the datasets.
The process is shown in Algorithm 4.1.
1 4
(a) (b)
Figure 4.2: (a) Initial grid partition for a feature p. (b) Initial vector assigned to the
output of a rule, with a length equal to p,s p and an phase equal to p,s p .
60
Chapter 4
The Single Input Complex Fuzzy Inference System Model
(
p , k 1 2.3333 ( s j − 1) )
(
c p , k 1 k ( s j − 1) )
k k +1
p p +1
k 1
61
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The SICFIS model has several advantages over traditional fuzzy rule-base systems.
In order to highlight these advantages as well as some considerations to be made for
assessing interpretability the taxonomy introduced in section 2.7.1 will be used.
The number of rules of the SICFIS is much lower than that of grid-partition based
methods; the combinatorial rule explosion problem is avoided given that the number of
rules grow linearly with the addition of features and partitions. Given that the number
of rules is equal to the number of features and partitions, the number of rules for the
SICFIS can be greater than that of cluster-based methods.
The number of conditions per rule is clearly reduced since the SICFIS model is a
single feature partition per rule FIS. The number of conditions per rule in both grid-
partition and cluster-based methods is usually equal to the number of features in the
dataset.
The number of conditions per feature is considerably reduced in the SICFIS. While
the number of conditions per feature in cluster-based methods is equal to the number
of clusters or rules, and the number of conditions per features in gird partition methods
is equal to the size of the grid, it will be demonstrated in the following sections that a
62
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The problem of two or more contradictory rules being fired at the same time is
avoided completely, given that a rule corresponds to the behaviour of a specific feature
partition, the concept of contradiction does not apply to the SICFIS model.
Additionally, the main characteristic of the SICFIS model is the ability to model the
interaction between feature partitions as interferences.
It is well known that the visual representation of machine learning and AI models
facilitates the extraction of knowledge of a system and increases its interpretability.
The SICFIS model specific properties allows for the representation of knowledge in
different forms, presenting an additional advantage over traditional fuzzy rule-base
models. In the following subsections different forms of representing knowledge will be
63
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The Magnitude-Phase plots are composed of the resultant magnitude and phase of
each individual feature p for a specific range of operation. The calculation of the
magnitude (4.13) and the phase (4.14) for a feature p is calculated as follows:
( ) ( )
Sp Sp
Sp
( ) ( )
Sp
where p,sp (k p ) is the normalized firing rule strength of a feature p and partition s p
which corresponds to the output of the second layer of the SICFIS model (4.6), kp is a
continuous variable with strictly increasing values within the specified range of
operation of a feature p. The transparency of the system can be demonstrated utilizing
the information contained in the magnitude-phase plots, as the behaviour of the system
for any combinations of values within a range of operation can be assessed and
measured. An example of a magnitude-phase plot is shown in Figure 4.6.
Even though the SICFIS is not a traditional rule-base it can however represent one.
A grid partition rule-base can be created by measuring the resultant vector of all
possible combinations of feature partitions. The problem of combinatorial rule
explosion can be avoided by creating short rules [114] utilizing only the most important
64
Chapter 4
The Single Input Complex Fuzzy Inference System Model
feature partitions which can be easily assessed by measuring the magnitude of each
feature partition. This provides an additional level of control over the granularity and
interpretability of the model. Table 4.2 shows an example of a small SICFIS rule-base
and Table 4.3 shows the derived grid-partition rule-base from the SICFIS rule-base.
Table 4.3: Example of the derived grid-partition rule-base from the SICFIS rule-base.
Premise Consequence
If A1 is “High” and A2 is “High” Then B1 + B3
If A1 is “High” and A2 is “ ow” Then B1 + B4
If A1 is “ ow” and A2 is “High” Then B2 + B3
If A1 is “ ow” and A2 is “ ow” Then B2 + B4
The vector partition plots shows two different graphs, the first one shows how a
feature p is partitioned into the different membership function p , s for s p = 1,
p
, Sp ,
the second graph represents graphically the consequence corresponding to the partitions
of the feature p as a two dimensional vector with a magnitude p , s and an phase p , s .
p p
The vector partition plot presents the rules premises and consequences in an orderly
manner. This allows the user to identify and measure the interaction between different
partitions corresponding to different features. An example of the vector partition plot
of three features is shown in Figure 4.4.
65
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The cosine distance matrix plot represents the level interference between each two
partition consequences, with a number within [-1,1], to represent degree to which an
interference is destructive or constructive respectively. The cosine distance matrix
information, can be used just as a Pearson correlation matrix plot to derive knowledge,
compared with the correlation matrix, the cosine distance matrix is able to represent the
non-linear relationship between the different partitions. An example of the cosine
distance matrix plot is shown in Figure 4.5.
It is known that increasing the percentage of carbon in steel improves its strength
until a threshold is met, any addition of carbon beyond this threshold will decrease its
strength as the material becomes too brittle. The content of carbon can be labelled as
low (L), medium (M), high (H) and very high (VH). For this example, two more
features are included, one is the content of iron, and finally let’s assume that a process
“X” is applied to the material in order to improves its properties. For simplicity let’s
assume the effect of the content of iron and the process “X” is the same for the whole
range of possible input values, and therefore a feature partition of the iron and the
process “X” will not be created as it is in the case of carbon.
66
Chapter 4
The Single Input Complex Fuzzy Inference System Model
detrimental to its strength is met when the content is H, and is completely detrimental
when it reaches VH, then we can infer that the output vector H is orthogonal to that of
the content of iron output vector, and for VH a destructive interference occurs. Further
let’s suppose that the process “X” is known to improve the strength of high carbon steel
and has little or no effect for low, medium or very high carbon steel, meaning that a
constructive interference occurs with the H carbon partition, and for the rest of the input
values little or no interference occurs.
The SICFIS rule-base is shown in Table 4.4, the corresponding grid partition rule-
base is shown in Table 4.5. It is clear how the SICFIS rule-base contains fewer rules
than that of the grid partition, the difference becomes greater as more feature partitions
are created as the number of rules grows exponentially for the grid partition fuzzy rule-
base system and linearly for the SICFIS model.
Figure 4.4 shows the vector partition plot of this model. As mentioned previously,
the carbon content is partitioned into 4 membership functions, and the corresponding
output of each rule is shown below. No feature partition is implemented for the iron
content and the process ‘X’, therefore only one output vector is assigned. Figure 4.5
shows the cosine distance matrix plot which shows the degree of interference between
the different feature partitions. Figure 4.6 shows the corresponding magnitude-phase
plots, which represents the magnitude and phase values of the feature vector for all the
possible values within the range of operation.
67
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Figure 4.7 shows the results given three different scenarios: a) the total strength of a
high carbon steel when the process “X” is not applied, b) the total strength of high
carbon steel when the process “X” is applied and c) the total strength of medium carbon
steel when the process “X” is applied. From the results it can be confirmed that the
process “X” increases the strength of high carbon steel and has little effect on medium
carbon steel. Additionally, the high carbon steel with the process “X” has the same
strength as medium carbon steel. It is demonstrated with this simple example how the
CFS can be used to model the complex interaction between alloying elements and
process utilizing interferences.
C Fe X
Figure 4.4: Vector partition plot for Carbon (C), Iron (Fe) and the process “X”.
68
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Fe
CM
CH
C VH
C VH
CM
CH
Fe
Figure 4.5: Cosine distance matrix plot for Carbon (C), Iron (Fe) and the process “X”.
C Fe X
Magnitude
Phase
Figure 4.6: Magnitude Phase plots for Carbon (C), Iron (Fe) and the process “X”.
Figure 4.7: Resultant vector for high carbon steel, medium carbon steel with process
“X” and high carbon steel with process “X”.
69
Chapter 4
The Single Input Complex Fuzzy Inference System Model
4.5 Optimization
f f h f hIm
= Re
+ (4.15)
p ,s p hRe p ,s hIm p ,s
p p
f f h f hIm
= Re
+ (4.16)
p ,s p hRe p , s hIm p, s
p p
f f h p ,s p f hIm p ,s p
= Re
+ (4.17)
p , s p hRe p , s p , s hIm p, s p, s
p p p p
f f h p ,s p f hIm p ,s p
= Re
+ (4.18)
c p ,s p hRe p, s p c p ,s p hIm p ,s p c sp ,s p
w = 1,1 P,S P
1,1 P,S P
1,1 P,S P
c1,1 cP , SP (4.19)
70
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Each iteration of the algorithm is called an epoch. The process is repeated until an
end condition is met, such as conditions may include reaching a maximum number of
epochs or a local minimum.
Results from a recursive backpropagation for the Charpy impact test dataset is shown
in Figure 4.8. The model was trained for 50 epochs, taking a total of 1937 seconds to
be computed in a computer Windows 10 with a processor intel i5-9400F @ 2.90 GHz
with an installed memory RAM of 8.00GB, and a Graphic Processing Unit (GPU)
NVIDIA 1660 6GB.
71
Chapter 4
The Single Input Complex Fuzzy Inference System Model
72
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The recursive and batch backpropagation utilizes the information of the first
derivatives to find the local minima. It is possible to improve the optimization model
by including the information obtained from the second derivative. Algorithms that
utilize the second derivative are known as Newton-Raphson methods, and require the
computation of the Hessian matrix. For large models computing the Hessian matrix
becomes intractable [115]. The LM algorithm [116] utilizes an approximation of the
Hessian matrix utilizing the Jacobian that results in a fast and efficient optimization
algorithm shown in Algorithm 4.2.
Figure 4.10 shows the training performance of the LM algorithm for each epoch
applied to the Charpy impact test dataset. The model was trained for 40 epochs, taking
a total of 2.8 seconds to be computed in the same computer mentioned in the previous
section. The LM shows superior performance compared with both the recursive and
batch backpropagation algorithms, a further exponential reduction in computing time
is achieved by parallel computing and utilizing the approximation to the Hessian matrix.
73
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The SICFIS model presented in this Chapter presents a simple architecture, being
possible to train models within a few seconds with the addition of parallel computing.
By making a few modifications to the model, it is possible to obtain a faster SICFIS
model, reducing the training times even further. It is possible to obtain an equivalent
model, which maintains the advantages presented in section 4.4, by removing the
normalization operation in the second layer. This modification reduces the number of
operations considerably, especially for larger datasets. The fast-SICFIS model is a 4-
layer system as observed in Figure 4.11.
74
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The first layer is the fuzzification layer which assigns a degree of membership to a
partition sp of a feature p, according to
O1p , s p = p , s p (4.20)
The second layer is the implication operation, which multiplies the premises and the
consequences. The rectangular form of the complex singleton membership function is
used in order to facilitate calculations as follows:
p , s p = O p , s p cos( p , s p ) p , s p
2 1
ORe, (4.21)
p , s p = O p , s p sin( p , s p ) p , s p
2 1
OIm, (4.22)
The third layer is the vector aggregation (or rule interference) layer in which the real
and imaginary parts are added respectively as follows:
P Sp
O = ORe,
3
Re
2
p,s p (4.23)
p =1 s p =1
P Sp
3
OIm = OIm,
2
p ,s p (4.24)
p =1 s p =1
75
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The fourth layer calculates the magnitude and the phase of the resultant vector as
follows:
O 4 = O 3 arg ( O 3 ) (4.25)
The Charpy impact dataset will be utilized to compare the normalized-SICFIS model
and the fast-SICFIS model training times and performance. The LM algorithm
presented in previous section provided the best results and will be the one selected for
this analysis. Each feature is partitioned into three partitions. The models are trained
from 20 to 70 epochs, the RMSE is utilized to measure the performance, a 5 k-fold
cross validation is applied; the mean RMSE at each epoch is recorded. The results are
shown in Figure 4.12.
Figure 4.12 Charpy impact dataset, training, checking and testing performance for
different number of epochs for the normalized and fast SICFIS models.
Figure 4.13 shows the required computation time for different number of epochs. It
is observed a linear increase of computational time with the addition of epochs,
although with different slopes. For the 210 epochs the training times were 12.12 and
76
Chapter 4
The Single Input Complex Fuzzy Inference System Model
6.61 seconds for the normalized and the fast SICFIS model respectively, roughly twice
the computation time required. The difference between the RMSE is minimal and may
be attributed to random effects. Further comparison of the performance between models
will be presented in the results sections for the three real world datasets.
Figure 4.13 Charpy impact dataset, training times for the normalized and fast SICFIS
models for different number of epochs.
4.7 Results
For the Charpy impact dataset the parameter grid is shown in Table 4.6. The RMSE
is used to measure the performance of the models. A summary of the results of the
normalized-SICFIS and the fast-SICFIS models are shown in Table 4.7 and Table 4.8
77
Chapter 4
The Single Input Complex Fuzzy Inference System Model
respectively. The best models obtained from both the normalized and the fast model are
shown in Table 4.9. The regression plots of the best performing models for the fast and
normalized-SICFIS models are shown in Figure 4.14 and Figure 4.15 respectively.
For comparison purposes four additional models are shown in Table 4.10. The first
model is a Mamdani FIS with singleton defuzzification, which is equivalent to a RBFN.
It is a 9 rule FIS, the input partition is 56.25-18.75-25 for training, checking and testing
respectively [25]. The second model is an ANFIS model with a quantum membership
function [117]. It is a 6 rule FIS created utilizing a fuzzy c-means clustering algorithm
and the input partition is 55-15-30 for training, checking and testing respectively [117].
The third and fourth model are a 6 and 8 rule Interval Type-2 TSK FISs (IT2-Squared)
78
Chapter 4
The Single Input Complex Fuzzy Inference System Model
respectively, as proposed in [40] for UTS predictions. The data partition is 60-20-20
for training, checking and testing respectively.
The differences between the normalized and the fast SICFIS performance are
comparable, with a lower SD between the results of the normalized-SICFIS model,
would mean the results are more consistent. The best results of both models are similar.
In comparison with other models, the mean performance of both SICFIS models is
comparable with the best models registered in Table 4.10. Demonstrating the
superiority of the SICFIS model, in both performance and computation time.
79
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Figure 4.14: Charpy Impact test, results regression plot, normalized-SICFIS model with
6 membership functions partitions per feature.
80
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Table 4.11: Charpy Impact, initial FIS and training computation times in seconds.
RBFN 9 Rules ANFIS 9 Rules Normalized-SICFIS 2mF Fast-SICFIS 2mF
Initial FIS 1.694s 1.728s 0.018s 0.017s
Training 1.44s 4.727s 2.52 0.134s
RBFN 10 Rules ANFIS 10 Rules Normalized-SICFIS 3mF Fast-SICFIS 3mF
Initial FIS 1.772s 1.772s 0.017s 0.017s
Training 1.618s 5.508s 2.53 0.18s
RBFN 11 Rules ANFIS 11 Rules Normalized-SICFIS 4mF Fast-SICFIS 4mF
Initial FIS 1.847s 1.824s 0.019s 0.016s
Training 1.759s 6.554s 2.52 0.238s
Figure 4.15: Charpy Impact test, results regression plot, fast-SICFIS model with 5
membership functions partitions per feature.
81
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The dataset includes two categorical features with 3 and 6 categories, a membership
function per category will be used for these features. The data partition is 70-30 for
training and testing respectively, the validation consists of 12 data points. The
parameter grid is shown in Table 4.13: UTS parameter grid.. The UTS results summary
containing the mean and SD results from the normalized and fast SICFIS model are
shown in Table 4.14 and Table 4.15 respectively.
82
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The best models obtained from both the normalized and the fast model are shown in
Table 4.16. The regression plots of the best performing models for the fast and
normalized-SICFIS models are shown in Figure 4.16 and Figure 4.17 and respectively.
Figure 4.16: UTS test, results regression plot, normalized-SICFIS model with 6
membership functions partitions per feature.
83
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Table 4.16: UTS Normalized and Fast SICFIS UTS Best Results.
No. mF Training Testing Validation All
Norm* Fast† Norm* Fast† Norm* Fast† Norm* Fast†
3mF 35.36 35.13 41.07 36.93 49.80 51.96 37.22 35.74
4mF 35.64 33.69 36.25 39.26 55.05 52.30 35.91 35.52
5mF 35.22 33.14 39.65 40.43 59.91 50.19 36.70 35.54
6mF 35.97 34.71 42.40 38.19 63.76 53.86 38.12 35.86
7mF 34.20 33.88 41.07 39.26 49.08 68.87 36.45 35.74
8mF 32.23 34.49 41.15 38.89 57.25 65.87 35.24 36.00
Norm: Normalized-SICFIS model. †Fast: Fast-SICFIS model. ‡mF: Membership Function.
*
Figure 4.17: UTS test, results regression plot, fast-SICFIS model with 5 membership
functions partitions per feature.
84
Chapter 4
The Single Input Complex Fuzzy Inference System Model
For comparison purposes the results of three different FISs is shown, the IT2-
Squared and the Multi- Objective Interval Type-2 Fuzzy Modelling (MOIT2FM) [118]
are type-2 FISs, the IMOFM-M [118] is a Mamdani type-1 FIS, all are composed of 6
rules. The RMSE is used to measure the performance and results are shown in Table
4.17. The results are mixed, as the normalized and fast SICFIS model outperform the
training partition, the testing partition performance is equivalent to the IT2-Squared and
MOIT2FM, while the validation partition underperforms in comparison.
The Bladder cancer dataset includes mostly categorical features. Three continuous
features contain integer values and therefore will be considered as being categorical in
this study. Therefore, only one feature is treated as being continuous. The Area Under
the Curve (AUC) is used to measure performance to compare with other models. The
AUC is calculated with the same dataset as the one in this work, that is the resulting
dataset of the records of the non-censored patients and the records of the right-censored
patients whose last observed time surpasses the 60-month threshold.
The Bladder cancer parameter grid is shown in Table 4.18. A summary of the results
obtained by the normalized and fast SICFIS models are shown in Table 4.19 and Table
4.20 respectively. The best results obtained are shown in Table 4.21.
For comparison 5 other models are shown in Table 4.22, these models being: The
Cox regression model, a logistic regression model (LoR), an ANN and two FISs. The
85
Chapter 4
The Single Input Complex Fuzzy Inference System Model
FISs shown in Table 4.22 have been integrated into the Cox regression model in order
to perform a risk prognosis analysis. The first is a type-1 FIS with 20 Fuzzy Mamdani
type rules and the second a type-2 FIS also composed of 20 Fuzzy Mamdani type rules.
Further information regarding these models can be found in [39].
The Receiver Operating Characteristic (ROC) curves of the best results for the
normalized and fast SICFIS models are shown in Figure 4.18 and in Figure 4.20
respectively. The corresponding scatter plots of the Scores are shown in Figure 4.19
and in Figure 4.21 respectively. The optimum point is selected as the point in which the
86
Chapter 4
The Single Input Complex Fuzzy Inference System Model
prediction accuracy it’s at its maximum. The confusion matrix corresponding to such
optimum point is shown in Table 4.23 and in Table 4.24.
87
Chapter 4
The Single Input Complex Fuzzy Inference System Model
88
Chapter 4
The Single Input Complex Fuzzy Inference System Model
89
Chapter 4
The Single Input Complex Fuzzy Inference System Model
A summary of the results obtained from the superconductivity data set are shown in
Table 4.25 and Table 4.26. The best results obtained given a number of membership
functions per feature is shown in Table 4.27. A results comparison is shown in Table
4.28, five different models are shown: a linear regression model, an XG-Boost model,
an ANFIS model and two ANNs. Both the linear regression and XG-Boost results are
obtained from [109]. The training partitions for the linear regression and the XG-Boost
model is 2/3 for training and 1/3 for testing, the reported results are only for the out-of-
sample data and no information is available for the remaining partitions. The data
partition for the ANFIS model and the two ANN is 65-18-17 for training, checking and
testing respectively. The ANFIS model is composed of 8 rules, while the two ANN are
composed of 10 and 20 hidden layers.
90
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The interactions of processes and alloying elements and their effect on the material
properties are complex and are often difficult to represent. For the purpose of this
analysis, the magnitude-phase plots of a selected number of features is shown in Figure
4.23. Because the Charpy impact dataset is known for the scattered measurements this
diagram is obtained from a SICFIS model trained with the complete dataset. For
validation, the information in [119] will be utilized. This information contains a
comprehensive summary of the effect of alloying elements to notch toughness.
A scatter plot of the results is shown in Figure 4.22. The plot shows the whole
complex number coordinates. It is shown that most of the predictions are located within
the second and third quadrants.
As already stated, the Charpy impact test measures the notch toughness of a material
and characterizes the DBTT. The impact temperature is an important variable in the
model, and it is known that at low temperatures the material becomes brittle and at high
91
Chapter 4
The Single Input Complex Fuzzy Inference System Model
temperatures the material becomes ductile. Carbon is the main alloying element in steel,
of which a high concentration of carbon causes the material to become brittle and
therefore an increase of carbon in steel is associated with a decrease in impact energy
[119].
92
Chapter 4
The Single Input Complex Fuzzy Inference System Model
Given the known effect of both impact temperature and Carbon, it is possible in
general to associate a positive effect on impact energy to angles within the second and
third quadrant, and a negative effect to angles within the first and fourth quadrant. There
are exceptions to this however, and this would depend mostly on the interaction with
other alloying elements and the process [119].
Increasing the Manganese content reduces the transition temperature and improves
the upper shelf energy in low carbon steel. A lesser effect is observed in medium carbon
steel and has little effect on high carbon steels. Manganese can have the opposite effect
on tempered and hardened steel. In the magnitude and phase plot of these alloying
elements it is observed that a high content of Manganese is detrimental to high carbon
steel, high hardening temperatures and to tempering, while being beneficial to some
extent to low carbon steel [119].
Nickel is used to improve the materials properties at low temperatures but is also
known to have a negative effect on the upper shelf energy while Chromium is known
to increase the upper shelf energy. It is shown that a high content of nickel has a 180°
phase difference with a high impact temperature, hence creating a negative interference,
and it remains mostly orthogonal with a low impact temperature. Chromium’s phase,
however, would produce a positive interference with a high impact temperature and is
orthogonal to a low impact temperature, which means its effect is mostly on the upper
shelf energy [119].
Vanadium improves notch toughness [120], while the addition of Sulphur has a
negative effect in notch toughness [119]. This can be emphasized by the fact that
Sulphur is located within the fourth quadrant and vanadium in the second and third
quadrants.
93
Chapter 4
The Single Input Complex Fuzzy Inference System Model
4.9 Summary
To the authors’ best knowledge, the SICFIS model is the first interpretable CFIS
based on CFSs. It was demonstrated that the SICFIS model performs equivalently to
other well-known models with as little as 2 partitions per feature. Computational times
are reduced exponentially due to its simple structure and the application of GPU parallel
computing.
The results obtained from the Charpy impact test are superior to other FISs. SICFIS
was shown to be transparent and interpretable. The interpretability analysis performed
on the magnitude-phase plots is consistent with what is currently known in the
literature. Given the single input-partition-per-rule architecture of SICFIS it is possible
to determine the individual effect of each alloying element and process. Moreover,
eliciting an initial SICFIS is approximately 100 times faster than traditional FISs
utilizing a subtracting Clustering algorithm. The training time is 10 and 30 times faster
compared with the RBFN and the ANFIS models respectively. The fast-SICFIS model
can improve the computation times even further with a more computational efficient
architecture and the power of parallel computing.
The results obtained from the UTS dataset for the training and testing partition
produce equivalent performance to the other FIS methods, for the 12 validation points
the results perform are sub-optimal, and more work is required to improve upon the
results.
The results obtained from the Bladder cancer prediction were superior to the other
models, excluding the type-2 FIS. It should be mentioned even better results may have
been obtained by modifying the model to perform a proper survival analysis, which is
out of the scope of this work. The fact that it demonstrated a superior performance
compared with state-of-the-art models shows promise of utilizing SICFIS for other
94
Chapter 4
The Single Input Complex Fuzzy Inference System Model
The results obtained from the superconductivity dataset are comparable with the
ANN and ANFIS models. Demonstrating the capabilities of the normalized and fast
SICFIS models to perform predictions with large datasets.
The normalized and the fast SICFIS models provide similar results. The slight
reduction in the standard deviation obtained from the result summary may indicate a
more consistent performance from the normalized-SICFIS model. The fast-SICFIS
model can train models around two times faster than the normalized SICFIS model, this
reduction in computational time may become more significant for larger datasets.
Therefore, the trade-off between computational speed and consistent results should be
taken into consideration depending on the application, such as in real-time applications,
it would be of great benefit a considerable reduction in computational times. For
datasets with a large number of categorical features the fast-SICFIS model would in
theory be a better choice as demonstrated by the results obtained by the Cancer dataset.
In addition to the superior performance obtained from both SICFIS models it was
demonstrated the interpretability and transparency of the models. Among the different
knowledge representation methods it can be argued that the magnitude-phase plots
provide crucial information for the validation of the model and the extraction of
knowledge, moreover, its interpretability is not affected when there is overlap between
partitions or when the number of partitions increases as it may be the case with the
vector partition plots and the cosine distance matrix plot.
95
Chapter 4
The Single Input Complex Fuzzy Inference System Model
In comparison with type-1 and type-2 FIS, the SICIFS model provides better insight
of the individual effects of a feature in the overall performance of a model.
Additionally, the SICFIS rule-base can be represented utilizing the traditional type-1
fuzzy rule-base, with an additional control over the granularity of the information
presented.
96
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
Chapter 5
The Adaptive Neuro Fuzzy Inference System with
Single Input Complex Fuzzy Inference System
Consequences
The TSK FIS is a rule-base model whose premises are composed of linguistic
variables and the consequences are composed of functions, which are most commonly
linear regression models. Each rule represents a region of the dataset that can be
approximated by a local linear model, this divide-and-conquer strategy allows to model
complex nonlinear systems as a combination of interpretable linear models. Defining
fuzzy boundaries allows for a better representation of the local model and improves the
prediction accuracy for data points located within the boundaries of two or more local
regions. Dividing a large and complex problem into local interpretable models may
become a problem as the complexity increases. The larger and the more complex a
dataset is the more rules are required to model its behaviour, hence decreasing its
interpretability.
In order to improve the prediction accuracy of TSK models and reduce the number
of rules some authors have devised different adaptations to the TSK architecture.
Models such as the neural networks designed on approximate reasoning architecture
[121], and the co-active neuro fuzzy inference system [122] embed ANNs to the TSK
FIS architecture with the objective of combining the interpretability of FISs and the
prediction accuracy of ANNs. Embedding ANN into FIS reduces considerably, if not
97
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
at all, its interpretability as ANN are black-box models. These models are not to be
confused with the popular ANFIS model [76], the ANFIS is a TSK FIS and does not
embed ANN to its architecture but rather utilizes backpropagation learning algorithms
to improve its accuracy while maintaining its interpretability.
Other strategies developed to reduce the number of rules and improve upon the
accuracy of the results while maintaining the transparency and interpretability of the
system include replacing the consequences of a TSK FIS with nonlinear functions. The
number of rules is reduced considerably given that the overall architecture of the system
is local-nonlinear which can describe a larger region of the dataset more accurately
compared with linear models. These methods have been applied for control
applications. Rajesh [123] include sinusoidal functions to improve accuracy of a
controller. Sala and Ariño [124] utilize polynomials from Taylor series expansion.
Tanaka [125] utilizes a sum of squares for modelling non-linear dynamical systems.
Dong [126] utilizes local nonlinear TSK rules for the design of a controller.
In this work it is proposed to replace the linear consequence of the TSK with the
fast-SICFIS model. In Chapter 4 the interpretability properties of the SICFIS was
demonstrated, its superior accuracy compared with other models, and the considerable
reduction in training times, especially in the case of the fast-SICFIS model were also
shown. These properties make it an ideal candidate for improving upon the accuracy
of the ANFIS model while retaining its interpretability. The Results obtained are
comparable with ensembles of ANN. Training times are comparably lower than other
more complex methods, while maintaining its interpretability.
The ANFIS-SICFIS model is a neuro fuzzy inference system based on the popular
ANFIS architecture. The ANFIS-SICFIS premise is composed of a traditional type-1
98
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
rule-base and the consequences are composed of SICFIS models. The ANFIS-SICFIS
fuzzy rule-base is given in Table 5.1.
r
where xp represents the input value for a feature p Ap represents a type-1 fuzzy
membership function for a rule r and a feature p and hr represents the output of a local
SICFIS model corresponding to the rule output.
The premise of the ANFIS-SICFIS can be represented as a three layered system, the
first layer fuzzifies the input utilizing a Gaussian membership function (5.1), the
second layer calculates the rule strength utilizing the product t-norm (5.2), finally the
third layer normalizes the fired rule strength (5.3).
1 x − c RB 2
O 1
= r , p = exp − p RBr , p (5.1)
2 r , p
r, p
P
Or2 = wr = r , p (5.2)
p =1
wr
Or3 = wr = R
(5.3)
w
r =1
r
99
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The premises of the rules correspond to a region of the dataset. The rules may be
defined by an expert or by utilizing a clustering algorithm. The clustering algorithm
allows to identify the associations between the inputs and the output in the dataset
[127]. The most common fuzzy clustering algorithm and the one utilized in this work
is the Fuzzy C-Means (FCM) algorithm [20]. The FCM algorithm is as follows:
k =1 xi − ck
FCM
N N
Update prototypes c FCM
j uijm xi u m
ij
i =1 i =1
N C
compute objective function J m = uicm xi − c FCM
2
j
i =1 j =1
prototypes and N is the total number of instances in the dataset. From the FCM
clustering algorithm it is possible to create a rule-base utilizing the c prototypes and the
fuzzy partition matrix u. The centers of the Gaussian membership functions for the rule-
RB FCM
base cr , p are equal to the projections of the prototypes c of the FCM algorithm.
The calculation of the spreads r , p are calculated utilizing the fuzzy covariance matrix
RB
[27] as follows:
)m ( a k − v i )( a k − v i )
N
(u
T
ij
j =1
Covi = n
(5.4)
(u
j =1
ij ) m
100
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The fuzzy partition exponent determines the “fuzziness” of the clustering algorithm.
It can be shown that when m=1 the FCM algorithms produces “hard” partitions of the
dataset [128]. The degree of fuzziness or overlap between partitions can be measured
utilizing the partition coefficient shown in (5.6) [20]. The partition coefficient
approaches 1 as the partition become “harder”. Similarly, a partition coefficient of the
rule-base can be measured utilizing normalized rule strength wr (5.3), instead of the
fuzzy partition matrix u as shown in (5.7).
N R
PartitionCoefficient (Rule-Base) = ( wi ,r )
2
N (5.7)|
i =1 r =1
where N is the total number of instances C is the number of clusters and R is the
number of rules. Figure 5.1 shows the partition coefficient value as the fuzzy partition
exponent m increases for both the FCM and the rule-base. A sharp decline in the FCM
partition coefficient is observed as m increases with lesser effect in the rule-base
partition coefficient. Therefore, it can be inferred that to obtain distinguishable local
interpretable models it is important to choose a partition coefficient value between 1
and 2.
101
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
the symbol for the latter. The SICFIS will be represented as a nonlinear function, the
architecture of the model was presented in Chapter 4.6 and the equations are
summarized below:
h r ( x) = (g ) +(g )
r 2
Re
r 2
Im (5.8)
P Sp
g r
Re = pr , s p cos( pr , s p ) pr , s p (5.9)
p =1 s =1
P Sp
r
g Im = pr , s p sin( pr , s p ) pr , s p (5.10)
p =1 s =1
2
1 x p − c p,s p
CFR , r
r
= exp − (5.11)
p,s p
2 CFR ,r
p,s p
Figure 5.1 Fuzzy partition coefficient values given different clusters and changing the
fuzzy partition exponent value.
102
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The output of the SICFIS model is a complex number, with a phase and a magnitude,
something referred to as the “dual output property”; in Chapter 4 the magnitude of the
SICFIS model was used to asses its performance and the phase was utilized as
additional information utilized during the interpretability analysis. To adequately
address the dual output property of the SICFIS within the context of the ANFIS-SICFIS
model, two different approaches will be explored. The first one passes the output of the
SICFIS as real-valued, that means, only the magnitude information of the output. The
second approach passes the output of the SICFIS as complex-valued.
The first approach is relatively straightforward, as the last layers of the ANFIS-
SICFIS simply perform an algebraic product between the normalized rule strength and
the real-valued consequent later to sum the outputs of each rule, the output of this model
is real-valued, therefore from here after this approach will be defined as the real-
ANFIS-SICFIS model. The second approach would require and additional layer, a
second rule interference layer, which would calculate interference between the rules,
the final output of this model is complex-valued, therefore the magnitude is utilized to
assess its performance and the phase can be used as additional information, from here
after this approach will be defined as the complex-ANFIS-SICFIS model.
5.2.3 Real-ANFIS-SICFIS
The first three layers of the real-ANFIS-SICFIS represent the premise rule-base of
the system described by the equations (5.1)-(5.3). The fourth layer of the real-ANFIS-
SICFIS is the magnitude of the local SICFIS for a rule r as follows:
103
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The final output aggregates the inference between the premises and the
consequences of each rule as follows:
R
O Real,5 = wr hr (5.13)
r =1
104
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
superscript with the symbol is used for the premise parameters of the type-1 fuzzy
Consequence parameters:
pr , s pr , s p , s
r
cpr, s p (5.15)
p p p
Derivatives:
f f r
= (5.16)
r , p r r, p
f f r
= (5.17)
cr , p r cr, p
f f hr g r hr g Im r
= r r Re
+ r (5.19)
pr , s p h g Re pr , s g Im pr , s p
p
f f hr g r v rp , s p hr g r v rp ,s p
= r r Re
+ r Im
(5.21)
c p , s p h
vr
g Re v rp ,s c vpr,s g v p , s p c p , s p
r vr
p p Im
5.2.4 Complex-ANFIS-SICFIS
105
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
(5.1)-(5.3). The output of the fourth layer of the complex-ANFIS-SICFIS utilizes the
real and the imaginary output of the SICFIS for a rule r as follows:
P Sp
OrComplex,4
,Re = g Re
r
= pr , s p cos( pr , s p ) pr , s p (5.22)
p =1 s =1
P Sp
OrComplex,4
,Im = gIm
r
= pr , s p sin( pr , s p ) pr , s p (5.23)
p =1 s =1
Given that the output of the fourth layer is a complex number, the complex-ANFIS-
SICFIS includes an additional layer, which measures the interference between the rules,
additionally, each rule consequent is multiplied by the normalized rule strength. The
output of the fifth layer is also a complex quantity with a real and an imaginary part as
follows:
R
Complex,5
ORe = wr g Re
r
(5.24)
r =1
R
Complex,5
OIm = wr g Im
r
(5.25)
r =1
106
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The output of the sixth layer calculates the magnitude and the phase of the real and
imaginary quantities obtained from the output of the fifth layer. The magnitude is
utilized to make the predictions and measure the performance of the system while the
phase is utilized for additional information.
Premise parameters:
r, p cr, p (5.27)
Consequence parameters:
pr , s pr , s p , s
r
cpr, s p (5.28)
p p p
Derivatives:
f f hRe r f hIm r
= + (5.29)
r , p
hRe r r , p hIm r r, p
f f h g r f hIm g Im r
= Re Re
+ (5.31)
pr , s p hRe g Re
r
pr , s p hIm g Im
r
pr , s p
107
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
f f h g r f hIm g Im r
= Re Re
+ (5.32)
pr , s p hRe g Re
r
pr ,s p hIm g Im
r
pr , s p
f f h g r v rp ,s p f hIm g Im r v rp ,s p
= Re Re
+ (5.33)
vr
hRe gRe
r
v r
vr
hIm gIm
r
v rp ,s p pvr,s p
p,s p p , s p p , s p
f f h g r v rp ,s p f hIm gIm r v rp , s p
= Re Re
+ (5.34)
cvpr,s p hRe gRe
r
v rp ,s p c vpr, s p hIm g Im
r
v rp, s p c vpr, s p
The local performance is assessed as follows: Each one of the instances in a dataset
is evaluated utilizing the trained real and complex ANFIS-SICFIS models. Instead of
utilizing the prediction of the global model, a local SICFIS model will be selected to
assess its performance, the local SICFIS model is selected according to the normalized
rule strength values. The rule with the highest normalized rule strength corresponds to
the local SICFIS utilized in the evaluation of that record. This is repeated for each data
point, the results are collected and the RMSE is calculated for the training, checking
and testing partitions. Both ANFIS-SICFIS models utilize the same evaluation method
shown in Algorithm 5.2.
108
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
During the training of the ANFIS-SICFIS models, the rule-base may be altered, this
may affect the local performance of a model. To asses these alterations three different
optimization strategies are to be implemented. The first one will optimize all the
parameters at the same time, we would define this as the complete parameter
optimization process. The second one would optimize the premise parameters and the
SICFIS parameters separately, each one at different epochs, this method is defined as
the alternate parameter optimization process. The third method would optimize solely
the SICFIS parameters, we would define this method as the consequent parameter
optimization process.
1 N
( yˆ Local , j − y j )
2
RMSELocal =
N j =1
It is expected that the consequent optimization algorithm would yield the best local
performance given that the premises of the rule-base would remain unaltered.
Additionally, it is expected that an initial rule-base created with a fuzzy partition
coefficient closer to one would improve upon the local performance. To assess these
hypotheses a parameter grid search will be performed with the parameters observed in
Table 5.2. This exhaustive grid search is implemented to the Charpy impact dataset,
resulting in the training of 1,440 models.
109
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
A summary of the results from the exhaustive grid search can be observed in the four
graphs shown in Figure 5.4 and Figure 5.5. The four graphs correspond to the Global
and Local performances of the real and complex -ANFIS-SICFIS, showing the mean
RMSE of the models with 2, 3 and 4 rules utilizing each of the three different
optimization strategies, Complete, Consequents and Alternate. The results include the
training, checking and testing partition performance displaying the corresponding
proportion of influence to the final error, the total length of these bars represent the
complete RMSE. Any performance registering a deviance of more than two standard
deviations is treated as an outlier and removed.
No correlation between the fuzzy partition coefficients, this may be very well
explained from the graphs in Figure 5.1 as there is not a major change in the partition
coefficient of the rules base for values of m between 1 and 2.
It is shown in the graphs below that the worse performing optimization strategy is
the alternate parameter optimization process. With just a minor difference between the
complete and the consequent optimization results. The complex ANFIS-SICFIS models
yielded better results for the local models. In Figure 5.6 the training times for each of
the optimization strategies is shown. It can be observed that the slowest algorithm is
the complete parameter optimization process as it’s training time grows exponentially
with the addition of rules and membership functions.
110
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
Figure 5.4: Real and Complex ANFIS-SICFIS global performance for the three
optimization process given 2,3 and 4 rules. Stacked bar chart.
Figure 5.5: Real and Complex ANFIS-SICFIS local performance for the three
optimization process given 2,3 and 4 rules. Stacked bar chart.
111
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
Figure 5.6 Training times for the complex-ANFIS-SICFIS model utilizing the alternate,
consequent and complete parameter optimization method with a varying number of
rules and membership functions (mF). Overlapping bar chart.
The fastest and worse performing optimization process is the alternate optimization
algorithm. Therefore, the consequences optimization algorithm offers the best trade-off
between Local-Global performance and training times.
It is concluded that the best results are obtained utilizing the complex-ANFIS-
SICFIS model, and the consequent optimization process. Therefore in the following
section and simulations it is the model selected to obtain the results.
A parameter grid search was performed on the Charpy impact test in the previous
section to determine the performance of the two different ANFIS-SICFIS models and
various optimization methods. A more detailed analysis based on the previous results
112
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
obtained is performed, the details of the new grid search are shown in Table 5.3, the
training, checking and testing partition remains 65-18-17 respectively.
Table 5.3: Parameter grid search for the Charpy impact test.
Parameter Values
Models {Complex-ANFIS-SICFIS}
Optimization Method {Consequents}
Number of rules {2,3,4,5,6}
Number of membership functions per feature (SICFIS) {2,3,4}
Fuzzy partition coefficient values {1.2,1.8}
Number of k-fold cross validation per model 5
Maximum number of epochs 70
The mean results and the corresponding standard deviation given a number of rules
are shown in Table 5.4 and Table 5.5 respectively. The mean RMSE for the training
decreases with the addition of rules, while the checking and testing mean RMSE
increases. The effect is greater for the local performance. The sharp increase in the
standard deviation and mean RMSE given 6 rules indicates overfitting.
Table 5.4: Charpy Mean RMSE results given different number of rules.
Training Checking Testing All
No. Rules Global Local Global Local Global Local Global Local
2 15.78 15.81 19.21 19.32 19.93 19.94 17.22 17.27
3 15.19 15.37 18.83 19.21 19.19 19.46 16.67 16.90
4 14.68 16.19 20.13 21.44 20.32 21.39 16.87 18.24
5 14.53 15.43 19.69 20.62 19.98 20.97 16.62 17.54
6 14.58 20.02 19.29 24.15 19.27 24.41 16.42 21.68
Table 5.5: Charpy Standard deviation results given different number of rules.
Training Checking Testing All
No. Rules Global Local Global Local Global Local Global Local
2 0.947 0.976 1.421 1.377 1.303 1.317 0.650 0.674
3 1.268 1.100 1.385 1.386 1.419 1.530 0.604 0.479
4 0.946 1.645 1.755 2.191 1.771 2.260 0.478 1.378
5 1.341 1.396 1.799 1.890 1.584 1.925 0.893 1.071
6 1.100 10.738 1.398 9.815 1.430 9.907 0.658 10.288
113
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
Similar results are observed in Figure 5.7, where the addition of membership
function results in a decreasing RMSE for the training partition and an increasing
RMSE for the testing partition.
The best results given different number of rules are shown in Table 5.6, with the
corresponding number of membership function per feature. For comparison purposes
the results obtained from different studies utilizing ANN are shown in Table 5.7. The
first one is an Ensemble-NN [129], the second one is an ANN model whose
hyperparameters are selected with a GA [129], the third one is an GA-NN Ensemble
[129], which optimize the hyperparameters as well as the ensemble structure. The best
out-of-sample RMSE was obtained with a 2-rule complex-ANFIS-SICFIS model with
4 membership functions per feature. The regression plots of the global and local models
are shown in Figure 5.8 and Figure 5.9 respectively.
114
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
115
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
116
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The same parameter grid search shown in Table 5.3 is implemented to the UTS
dataset, with a training-checking partition of 70-30 respectively and 12 data points for
the validation. The mean results and the corresponding standard deviation given a
number of rules are shown in Table 5.8 and Table 5.9 respectively. The mean RMSE
for the training decreases with the addition of rules, while the checking and testing
mean RMSE increases slightly. However, no major differences are observed between
the global and local performances for the training and checking partitions with the
addition of rules. An increase in the mean RMSE is observed for the validation
partition.
Just as in the case with the Charpy Impact test, in Figure 5.10, is observed that the
addition of membership function results in a decreasing RMSE for the training partition
and an increasing RMSE for the testing partition.
Table 5.9: UTS standard deviation of results given different number of rules.
Training Checking Testing All
No. Rules Global Local Global Local Global Local Global Local
2 1.66 1.83 3.01 3.19 10.15 11.35 1.67 1.95
3 2.22 2.18 4.00 4.61 12.14 11.78 2.41 2.58
4 1.52 2.12 3.62 3.95 9.67 11.43 1.71 2.13
5 1.67 1.56 3.50 3.78 10.03 9.64 1.80 2.02
6 1.97 2.56 2.61 3.33 12.48 12.11 1.37 2.28
117
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The best results given a number of rules are shown in Table 5.10. For comparison
purposes the results obtained from different studies are shown in Table 5.11 as well as
the results obtained in Chapter 4. The best out-of-sample RMSE was obtained with a
5-rule complex-ANFIS-SICFIS model with 3 membership functions per feature. The
regression plots of the global and local models are shown in Figure 5.11 and Figure
5.12 respectively.
118
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
119
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
A smaller parameter grid search is performed on the Bladder Cancer dataset, shown
in Table 5.12 with a 70-30 partition for training and testing respectively. The mean and
standard deviation RMSE results obtained from the parameter grid search given a
number of rules is shown in Table 5.13 and Table 5.14 respectively. A decrease in
120
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
performance for the testing partition is observed with the addition of rules with just a
slight increase in the training partition performance.
The best results obtained given a number of rules are shown in Table 5.15. The
corresponding ROC curves and score scatter plots obtained from the best performing
models are shown in Figure 5.13, Figure 5.14 and Figure 5.15. It is evident from the
Table 5.13 and Table 5.15 that the ANFIS-SICFIS model overfits the bladder cancer
dataset. For comparison purposes Table 5.16 shows the results obtained in previous
studies as well as the results obtained in Chapter 4.
Table 5.12: Parameter grid search for the Bladder Cancer dataset.
Parameter Values
Models {Complex-ANFIS-SICFIS}
Optimization Method {Consequents}
Number of rules {2,3,4,5}
Number of membership functions per feature (SICFIS) {2,3,4}
Fuzzy partition coefficient values {1.2,1.8}
Number of k-fold cross validation per model 5
Maximum number of epochs 70
121
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
Table 5.15: Bladder Cancer best results given a number of rules and membership
functions.
Training Testing All
No. Rules No. mF* Global Local Global Local Global Local
2 2 0.9122 0.9034 0.8886 0.8734 0.9040 0.8928
3 3 0.9138 0.8658 0.8935 0.8667 0.9069 0.8665
4 3 0.9086 0.8481 0.8994 0.8397 0.9055 0.8453
5 3 0.9144 0.8441 0.8915 0.8041 0.9065 0.8292
*mF: membership function
(a) (b)
Figure 5.13: Bladder cancer ROC curves for the global (a) and local performance (b).
122
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
123
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
A summary of the results obtained from the superconductivity data set are shown in
Table 5.17 Table 5.18 and. The best results obtained given a number of rules and
membership functions is shown in Table 5.19. A result comparison is shown in Table
5.20.
124
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
5.4 Summary
This work presented an improvement of the traditional ANFIS model, whose linear
consequences are replaced with the SICFIS model, a non-linear and highly interpretable
model. The compactness, interpretability properties and low computation required to
train local SICFIS allows to create accurate rule-base system with a considerable low
number of rules.
From the optimization strategies presented it was determined that optimizing solely
the consequents would return the best performance, especially for local performance
evaluation, additionally this strategy would reduce considerably the training times for
larger datasets. The design of optimization algorithms that modify the premises of the
rule-base while maintaining in consideration it’s interpretability would require
modifications to the objective function, it has been proposed the application of
evolutionary algorithms in the optimization process in order to maintain interpretability
of the rule-base premises [93], [95], [130], [131]. It is important to consider that
evolutionary algorithms and other global optimization require to evaluate a large
number of models which increasing exponentially computation times. It is therefore
concluded that the premises of the rule-base should remain unchanged during the
optimization processes.
125
Chapter 5
The Adaptive Neuro Fuzzy Inference System with Single Input Complex Fuzzy Inference System
Consequences
The ANFIS-SICFIS was tested in four different datasets. The results obtained from
the Charpy impact test are comparable with large and complex ANN models, this
performance was obtained with just two rules. The results from the UTS dataset are the
best obtained so far in the literature. The results from the Cancer dataset
underperformed and overfitted the data, this may be caused by the large number of
categorical features in the dataset and the application of a least square optimization
algorithm instead of performing a survival analysis which is out of the scope of this
work. Results obtained from the superconductivity dataset are superior to most
modelling strategies.
126
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Chapter 6
Mamdani Single Input Complex Fuzzy Inference
System
6.1 Introduction
Complex membership functions have been proposed previously [57]. The complex
Gaussian membership functions proposed to date [74], [75] do not represent a trajectory
in 3 dimensions whit a coupled phase and magnitude. The sinusoidal membership
function proposed by Dick [73] does represents a 3 dimensional trajectory, where both
the magnitude and the phase are coupled.
127
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Sinusoidal and Gaussian membership functions are utilized for different purposes,
while the sinusoidal membership function is utilized to model semi-periodic behaviour,
Gaussian represent a region of space at a particular time [57]. The proposed complex
Gaussian membership function is therefore the first linguistic membership function
based on the CFS and CFL developed by Ramot et al. [8], [55].
1) The magnitude represents a type-1 fuzzy membership function, the phase is a non-
fuzzy quantity that represents the “context”.
2) A complex membership function in 3 dimensions should represent a trajectory,
not a surface.
3) A complex membership function should be equivalent to a traditional type-1
membership function when all the phases are equal to zero.
4) The defuzzification results in crisp complex number, with a magnitude and a
phase.
5) Given points (3) and (4); when all the phases in a system are equal, that is when
no interference occurs, the resultant magnitude should be equivalent to a traditional
type-1 system. Given that an ordering does not exists in complex numbers, the phase
should be taken into consideration together with a frame of reference.
S (x) = rS e j ( x)
S
(6.1)
128
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
where r represents the magnitude and the phase. The complex membership
functions described in the following section maps real-valued inputs to the complex
domain, → .
1, if k = b
ySingleton = Singleton (k ) = (6.2)
0, if k b
1 ( k − b ) 2
Gaussian
= Gaussian
(k ) = exp −
2
y (6.3)
129
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Figure 6.1 shows the two-dimensional view of a gaussian membership function and
singleton membership function, both functions centres are equal to 0.5 and the spread
of the Gaussian membership function is equal to 0.2.
arg( (k )) = (6.5)
130
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
where z represents the magnitude of the fuzzified variable. represents the phase
of the membership function, Re and Im represent the real and the imaginary
It should be noted that the magnitude of the complex singleton membership function
is equivalent to a type-1 singleton membership function with the addition of the context
represented by the phase variable . This is in accordance with points 1 and 3, and the
complex singleton membership function can be tough as a traditional type-1 singleton
membership function whose centers rotates according to the value of the context
variable . An example of a singleton membership function located at = 0.5 and
= 45 is shown in Figure 6.2. The dotted lines represent the slope where the trajectory
of k travels as well as the location of Re and Im for visual reference.
131
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
x = k *cos() (6.8)
y = k *sin() (6.9)
1 ( x, y) − 2
z = (k ) = exp − (6.10)
2
arg( x, y) = (6.11)
where x and y represent the real and the imaginary values, because the phase is
constant, the values should move in a straight line, and the slope represent the phase
. The z axis represents the magnitude of the CFS, which represents a traditional type-1
fuzzy set and its shape should then be of a type-1 gaussian membership function. An
example of a complex-Gaussian membership function is shown in Figure 6.3.
During the rule interference process, it is necessary to aggregate the real and
imaginary parts respectively. In the case of the complex Gaussian membership function,
it is necessary to separate the real and imaginary components of the complex Gaussian
membership function to assign the proportional degree of membership. This can be
accomplished by multiplying the Gaussian membership function by the absolute value
of a sine and cosine function. The absolute value is utilized given that the membership
value needs to remain positive. The real and imaginary components of a complex
132
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Gaussian membership function are shown in Figure 6.4. The complex Gaussian
membership function is as follows:
1 ( x, y) − 2 1 ( x, y) − 2
(k ) = exp − cos( ) + exp − sin( ) j (6.12)
2 2
where ( x, y) − represents the distance from the origin to the centre of the
membership function, is the spread and is the angle of the complex Gaussian
membership function.
133
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Figure 6.4: Three-dimension view of a complex Gaussian membership function and the
corresponding real and imaginary projection. Center =0.5, spread =0.2 and phase
= 45 .
The interference and defuzzification operation are relatively straight forward. The
complex gaussian membership function is represented by its real and imaginary part,
each is aggregated respectably, creating an interference. The obtained crisp value is a
134
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
complex quantity, and the measured output is the magnitude, while the phase is used
for additional information. The COG de-fuzzification is as follows:
D R
r
Re
(kd ) xd
h Re = d =1 r =1
D R
(6.13)
d =1 r =1
r
Re
(kd )
D R
r
Im
( k d ) yd
h Im = d =1 r =1
D R
(6.14)
d =1 r =1
r
Re
(kd )
f ( xi ) = (h ) + (h )
Re 2 Im 2
(6.15)
As explained in the previous section, the real and the imaginary parts of the complex
gaussian membership function corresponds to the projections to their respective axis.
The particle moves at k intervals in the space, at a rate of k cos() in the x-axis and
k sin( ) in the y-axis as shown in Figure 6.5. The equations (6.12)-(6.15) comply with
the objectives formulated at the beginning of this Chapter.
One of the essential requirements for the proposed complex gaussian membership
function is the equivalence to a type-1 membership function when all the phases are
equal to zero. Additionally, the proposed membership function is equivalent to a type-
1 system when all the phases in a system are equal, the magnitude of the defuzzied
value should remain constant as there is no interference, below is an example of the
defuzzification of two complex gaussian membership function and the defuzzification
of two type-1 gaussian membership functions. Table 6.1 shows the parameters of both
the complex and the type-1 Gaussian membership functions. The graphical
135
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
representation of the defuzzification are shown in Figure 6.6 and Figure 6.7 for the
type-1 and the complex membership function respectively.
Both the magnitude of the complex defuzzified value and the absolute value of the
type-1 defuzzification are the same. Complex numbers are not ordered; therefore the
resultant number has an phase of 240° and the type-1 quantity has a negative sign.
136
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
The Mamdani-SICFIS just as the SICFIS is a single rule per feature partition rule-
base FIS, each rule has one premise and one consequent, the premises are composed of
137
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
type-1 Gaussian membership functions and the consequents are composed of complex
Gaussian membership function defined in (6.12).
IF x1 is A THEN y = is 12
2
1
IF x2 is A THEN y = is 21
1
2
IF x2 is A THEN y = is 22
2
2
IF xP is APSP THEN y = is P p
S
The Mamdani SICFIS can be described as a 6 layered FIS, the first layer fuzzifies
the input utilizing a type-1 Gaussian membership function as follows:
2
1 x p − c p,s p
O 1
= p ,s p ( x p ) = exp − (6.16)
p,s p
2 p,s
p
138
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
The second layer calculates the consequents of the rules utilizing the complex
gaussian membership function, the real and imaginary components of the complex
gaussian membership function are as follows:
( ) cos(
2
1 k j − c p,s p
,s p = p,s p
Op2,Re = exp −
Re
) (6.17)
2 p,s p
p,s p
( ) sin(
2
1 k j − c p,s p
,s p = p,s p
Op2,Im = exp −
Im
) (6.18)
2 p,sp
p,s p
The third layer aggregates the real and imaginary components of the complex
Gaussian membership function respectively.
Sp
O 3,Re
p = p , s p ( x p ) pRe,s p (6.19)
s p =1
Sp
O3,Im
p = p , s p ( x p ) pIm,s p (6.20)
s p =1
Sp D
s p =1 d =1
Re
p,s p (kd ) p , s p ( x p ) kd cos( p , s p )
O 4,Re
p = Sp D
(6.21)
s p =1 d =1
Re
p,s p (kd ) p , s p ( x p )
Sp D
s p =1 d =1
Im
p,s p (kd ) p , s p ( x p ) kd sin( p , s p )
Op4,Im = Sp D
(6.22)
s p =1 d =1
Im
p,s p ( kd ) p , s p ( x p )
139
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
The fifth layer performs the rule interference layer is the output of the system as
follows:
P
O 5,Re
=h Re
= O p4,Re (6.23)
p =1
P
O5,Im = h Im = O p4,Im (6.24)
p =1
f ( x) = (h ) + (h )
Re 2 Im 2
(6.25)
6.3.1 Optimization
The optimization algorithm is the LM and the derivative equations are as follows:
f f hRe f hIm
= + (6.26)
p , s p hRe p , s hIm p, s p
p
f f h p ,s p f hIm p ,s p
= Re
+ (6.27)
p , s p
hRe p , s p , s hIm p, s p, s
p p p p
f f h p ,s p f hIm p ,s p
= Re
+ (6.28)
c p ,s p hRe p , s p c p , s p hIm p ,s p c p ,s p
f f hRe f hIm
= Re + (6.29)
p , s p
h p ,s hIm p , s
p p
f f hRe f hIm
= + (6.30)
cp , s p hRe cp , s p hIm cp ,s p
140
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
6.4 Results
For the Charpy impact dataset the parameter grid is shown in Table 6.3. The RMSE
index is used to measure the performance of the models. A summary of the results of
models is shown in Table 6.4. The best results given a number of membership functions
are shown in Table 6.5. The regression plot of the best performing model is shown in
Figure 6.9
141
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
142
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
For the UTS dataset the parameter grid is shown in Table 6.6 the RMSE is used to
measure the performance of the model. A summary of the results of models are shown
in Table 6.7. The best results given a number of membership functions are shown in
Table 6.8. The regression plots of the best performing model are shown in Figure 6.10.
143
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
For the Bladder Cancer dataset, the parameter grid is shown in Table 6.9 the RMSE
is used to measure the performance of the models during training. A summary of the
results of models measured utilizing the AUC are shown in Table 6.10. The best results
given a number of membership functions are shown in Table 6.11. The ROC curves of
the best performing model is shown in Figure 6.11 and the scatter plot of the scores is
shown in Figure 6.12
144
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
145
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
The superconductivity results are shown in Table 6.12. The data partition is 65-18-
17 for training, checking and testing respectively.
In order to perform some comparison between the Mamdani, normalized and fast
SICFIS models beyond the prediction accuracy, the magnitude and phase plots of the
146
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
output of three features, carbon, tempering temperature and impact temperature are
shown in Figure 6.13, Figure 6.14 and Figure 6.15 respectively, each feature is
partitioned by 5 membership functions.
Table 6.13: Charpy impact normalized, fast and Mamdani-SICFIS best results given 5
membership functions (mF).
Training Checking Testing All
Normalized 5mF 15.23 21.12 19.75 17.25
Fast 5mF 15.38 19.63 18.52 16.77
Mamdani 5mF 16.66 18.89 18.03 17.32
On the one hand the sharp changes shown in Figure 6.15 may result in overfitting,
on the other hand the small changes shown in Figure 6.13 may result in an
underperforming model. From Table 6.13 it can be observed that the Mamdani SICFIS
model obtained the best out-of-sample RMSE in comparison with the normalized and
fast SICFIS models. Therefore, it may be concluded that the Mamdani-SICFIS model
may model uncertainties more appropriately in the Charpy impact test dataset than the
normalized and fast SICFIS models.
Figure 6.13: Magnitude-Phase plots for the Mamdani-SICFIS model for Carbon,
Tempering Temperature (T.Temp) and Impact Temperature (Imp. Temp).
147
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
Figure 6.14: Magnitude-Phase plots for the Normalized-SICFIS model for Carbon,
Tempering Temperature (T.Temp) and Impact Temperature (Imp. Temp).
Figure 6.15: Magnitude-Phase plots for the Fast-SICFIS model for Carbon, Tempering
Temperature (T.Temp) and Impact Temperature (Imp. Temp).
6.6 Summary
148
Chapter 6
Mamdani Single Input Complex Fuzzy Inference System
to cases in which all the phases in the system are aligned, that is, when no interference
occurs in the system.
The results obtained from the SICFIS -Mamdani model are comparable with other
FIS systems such as the RBFN and the ANFIS models. The results did not outperform
the singleton-SICFIS model. These results are consistent with type-1 Mamdani FIS,
which are known to be less accurate than RBFN and TSK FISs. The reduced accuracy
can be compensated with an increase in the interpretability of the model.
149
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets
and the Single Input Complex Fuzzy Inference System
7.1 Introduction
150
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
The SICFIS model introduced in Chapter 4 presents novel methods for interpreting
and extracting knowledge. The SICFIS model maps real-valued inputs into the complex
domain, representing the relationship between input and output variables as
interferences. The magnitude-phase plots introduced in section 4.4.3 display the
behaviour of the system given any combination of inputs within a range of operation.
A filter method utilizing complex-valued statistics and the information extracted from
the magnitude-phase plots is devised and implemented in four real-world datasets
utilized in this work.
For comparison purposes a wrapper method utilizing the SICFIS model and a
filter/wrapper method utilizing fuzzy rough sets are to be implemented in Charpy, TS
and Bladder cancer datasets previously studied, in order to compare the performance of
the SICFIS filter. For the superconductivity dataset, a result comparison is presented
from the results presented in [109].
Wrapper methods select a subset of features based on the impact these features have
on the prediction accuracy. Wrapper methods are “model agnostic” meaning that any
model can be selected, including simple linear models or more complex machine
learning models such as ANN. Wrapper methods can be considered “brute force” as it
requires to compute a large number of models to derive a proper subset of features.
These methods become intractable as the dimension of the dataset increases, given that
the number of models needed to evaluate grows exponentially. To reduce the size of
the grid search, it is possible to implement Greedy search strategies. Greedy search
strategies can be either forward selection or backward elimination [136].
151
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
and compared, the best performing feature is added to a subset of features, once added
to this subset, it will be part of the remaining iterations of the algorithm. This process
is repeated until an end condition is met, such as an optimal number of features are
selected, or no features are left to be tested, the forward selection algorithm is shown in
Algorithm 7.2. The backward elimination algorithm works opposite, eliminating the
worst performing feature at each iteration, the backward elimination algorithm is shown
in Algorithm 7.1. The order in which the features are eliminated or added to the
algorithm can serve as a measurement of their impact on the prediction [136].
For k = 1: Aj −1
Bk = Aj −1 \{ak }
Calculate performance f (Bk )
End
Aj = Bk Best Performance
End
For k = 1: Aj −1
Bk = Aj −1 {ak }
Calculate performance f (Bk )
End
Aj = Bk Best Performance
End
152
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
The results of the first three real-world datasets are summarized in Table 7.1. The
order in which features are eliminated is shown in descending order, showing at the last
row of each column the last remaining feature, which can be considered as the most
important feature for prediction accuracy. To assess the performance of the feature
selection algorithm, P-1 models are to be trained and evaluated (P being the number of
features in a dataset), each with a decreasing number of features according to the results
obtained and shown in Table 7.1. Ideally, a slight decrease in performance should be
observed, a sharp decrease in performance would indicate an improper elimination of a
feature. Results for the Charpy, UTS and Bladder Cancer datasets are shown in Figure
7.1, Figure 7.2 and Figure 7.3 respectively
153
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Figure 7.1: Charpy Impact Test SICFIS Backward elimination feature selection results.
154
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Figure 7.3: Bladder Cancer SICFIS Backward elimination feature selection results.
Rough sets and fuzzy rough sets can be utilized to measure the dependency between
features and output variables. The rough set feature dependency is a measure of how
accurately a set of features can describe the output. An information table filled with
irrelevant and/or random features would score a low dependency value. The method
described in this section for feature selection may be classified as a filter/wrapper
method, given that it is necessary to implement “brute-force” algorithms to measure the
feature dependency of different combination of features. Methods such as particle
swarm optimization [46] and a forward selection algorithm [49], [50] have been
155
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
A major disadvantage of utilizing fuzzy rough sets methods for feature selection is
the exponential growth of computational time with the addition of features and the
number of instances in the dataset [52], the implementation of parallel computing
operations reduces considerably the computation time for larger data-sets, nonetheless
memory problems may arise for “big data” applications.
In section 2.5 rough sets and fuzzy rough sets were introduced. The method for
calculating the fuzzy roughs sets, positive region and feature dependency utilized in
this work is the same as the one introduced by Etienne and Kerre in [48] and further
developed by Jensen and Shen in [49]. Three different fuzzy similarity relationship
equations utilized to calculate fuzzy-rough sets were presented, the three equations are
presented again for clarification below:
p( x) − p( y )
R ( x, y ) = 1 − (7.1)
p
pmax − pmin
( p( x) − p( y )) 2
R p ( x, y ) = exp − (7.2)
2 p
2
p ( y ) − ( p ( x) − p ) ( p ( x) + p ) + p ( y )
R p ( x, y ) = max min , (7.3)
p ( x) − ( p ( x) − p ) ( p ( x) + p ) + p ( x)
The positive region and feature dependency of a fuzzy-rough sets are calculated as
follows:
POS RP
(Q )
( X ) = sup R X ( x ) P
(7.4)
X U / Q
p (Q) =
xU
POS RP
(Q )
( x)
(7.5)
U
156
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
7.3.1 Results
The order in which features are eliminated at each iteration for the Charpy, UTS,
and Bladder Cancer datasets are shown in Table 7.2, Table 7.3 and Table 7.4
respectively. The performance evaluation method utilized with the wrapper-SICFIS
method is implemented and the results for the Charpy, UTS and Bladder Cancer
datasets are shown in Figure 7.4, Figure 7.5 and Figure 7.6 respectively. It is seen that
the performance of Fuzzy similarity -1 is superior as compared with Fuzzy similarity -
2 and Fuzzy similarity.
Table 7.2: Fuzzy Rough set feature selection Charpy dataset variables eliminated at
each iteration
Feature Eliminated
Iteration Fuzzy Similarity - 1 Fuzzy Similarity - 2 Fuzzy Similarity - 3
1 V V Mo
2 Ni Ni V
3 Cr Cooling Medium Ni
4 Mo Mn Cooling Medium
5 Mn Cr C
6 Hardening Temperature Mo Site
7 Cooling Medium Hardening Temperature Mn
8 Test Depth Test Depth Cr
9 S S Test Depth
10 Al Site Hardening Temperature
11 Site Impact Temperature Impact Temperature
12 Si Si Si
13 Impact Temperature Al S
14 Size Size Al
15 Tempering Temperature Tempering Temperature Size
Final C C Tempering Temperature
157
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Table 7.3: Fuzzy Rough set feature selection UTS dataset variables eliminated at each
iteration
Feature Eliminated
Iteration Fuzzy Similarity - 1 Fuzzy Similarity - 2 Fuzzy Similarity - 3
1 V V V
2 Al Al Cr
3 Test Depth Cr Al
4 Ni Test Depth Ni
5 Mn Mn Cooling Medium
6 Cooling Medium Cooling Medium Test Depth
7 Site Site Mn
8 S Ni Site
9 Cr Hardening Temperature Hardening Temperature
10 Hardening Temperature S S
11 Si Si Si
12 Size Size Mo
13 C C C
14 Mo Mo Size
Final Tempering Temperature Tempering Temperature Tempering Temperature
Table 7.4: Fuzzy Rough Sets feature selection Cancer dataset features eliminated at
each iteration
Variable Eliminated
Iteration Fuzzy Similarity - 1 Fuzzy Similarity - 2 Fuzzy Similarity - 3
1 Cystectomy Cystectomy Cystectomy
2 Radiotherapy Radiotherapy Radiotherapy
3 Nodes Detail Nodes Detail Nodes Detail
4 Squamous Squamous Squamous
5 CIS Present CIS Present CIS Present
6 Vascular Vascular Vascular
7 SPB SPB SPB
8 Urothelium Urothelium Urothelium
9 Grade Grade Grade
10 Muscle Muscle Muscle
11 Sex Sex Sex
12 Age Age Stage
Final Stage Stage Age
158
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Figure 7.4: Charpy Fuzzy-rough sets Backward elimination feature selection results.
Figure 7.5: UTS Fuzzy-rough sets Backward elimination feature selection results.
159
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
The SICFIS model introduced in section Chapter 4 maps real-valued inputs to the
complex domain, this allows to model the interaction between features as interferences.
This process can be represented utilizing the magnitude-phase information of each
feature (section 4.4.2.1) to model the behaviour of the system given any input within
the range of operation. The magnitude and phase information for a feature p given an
input k are as follows:
( ) ( )
Sp Sp
Sp
( ) ( )i
Sp
160
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
where k is a continuous variable with strictly increasing values within the range of
operation of a feature p.
Given that the entire behaviour of the system is represented with the magnitude-
phase plots, it is possible to estimate which are the most important features in the
system. For example, below in Figure 7.7 are shown the magnitude-phase plots for the
Charpy impact test features, utilizing 3 membership function per feature. In Figure 7.8
the complex-valued output prediction when fixing all the features to a specific value
and varying each one of the following features Carbon, Sulphur, Nickel and tempering
temperature is shown.
From the results shown in Figure 7.8 the feature “tempering temperature” produces
the highest complex-valued variance, followed by Carbon, while Nickel and Sulphur
hardly produce any variance in the complex valued output.
161
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Given the example, two different feature importance measurement methods may be
implemented. The first method takes into consideration the variables that produce the
greater variance in the output, these variables are: the magnitude of the resultant vector
of a feature and the rate of change of its magnitude and phase. The second method
measures the complex-valued covariance between a complex-value feature and the
predicted output.
A feature importance score based on a features magnitude and rate of change may
be calculated utilizing the magnitude-phase plots. One may calculate the area under the
curve of the magnitude and the area under the curves of the magnitude and phase rate
of change. This method presents several challenges: The first challenge arises from the
datasets itself. Such method would be appropriate only for datasets containing
continuous features with a uniform distribution. For example, the Charpy impact test is
known for its scattered measurements, (the histogram plots of each of the features is
shown in Figure 7.9, additionally the Bladder cancer dataset contains mostly categorical
features.
162
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
( ) ( )
Sp Sp
Sp
( ) ( )i
Sp
( ) ( )
Sp Sp
163
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
The following formula replaces the area under the curve of the magnitude with the
expected value of the magnitude, and the area under the curves of the rate of change of
the magnitude and the phase with the variance function.
var(Mag ) var( Php )
FeatureScore Mag − Ph
= E[Mag p ] P p
+ P (7.11)
p
The calculation of the expected value and variance of the magnitude is straight
forward. Calculating the variance of the phase requires some modifications to the
variance equation. The variance is calculated as the expected value of the squared
distances between the mean and the samples. Given that the angular values are circular,
it is more appropriate to calculate the angular distance between the mean value of the
complex random variable z as follows:
E[z p ] z p
2
164
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Given that the magnitude an the angular distance utilize different measurements,
each of the variables in (7.11) are normalized to give a proportional weight to each of
the variables.
In the previous section t was shown that the SICFIS model maps real-valued features
inputs into the complex domain, the variance and covariance of two complex-valued
random variables is as follows [137]:
FeatureScoreCov
p = cov(z p z output ) (7.16)
− Ph
FeatureScoreMag FeatureScoreCov
FeatureScoreCombined
p = P
p
+ P
p
(7.17)
FeatureScore
p =1
Mag − Ph
p FeatureScore
p =1
Cov
p
165
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
7.4.4 Results
FeatureScore p ,k
FeatureScore p = k =1
(7.18)
K
Each of the datasets will be evaluated utilizing the three feature score equations
(7.11), (7.16) and (7.17). Both the normalized and fast SICFIS models will be evaluated
utilizing the same method explained in the previous sections.
Results of the Charpy impact test for the normalized and fast SICFIS models are
shown in Table 7.5 and Table 7.6. The evaluation of each of the equations and both the
normalized and fast models is shown in Figure 7.13. Results obtained by the
normalized-SICFIS model are superior to that of the fast-SICFIS model, given the
obvious elimination of tempering feature. For both models the combined equation
performed slightly better than the Mag-Phase equation. The worse performing equation
was the covariance equation for both models.
166
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Table 7.5: Charpy Normalized-SICFIS filter method for feature selection results.
Combined Score Mag-Phase Score Covariance Score
1 Si 0.0220 Ni 0.0172 Al 0.0172
2 Al 0.0224 Si 0.0210 Si 0.0251
3 H.Temp 0.0401 Al 0.0258 H.Temp 0.0399
4 Ni 0.0632 H.Temp 0.0397 S 0.0411
5 Depth 0.0665 Depth 0.0630 Depth 0.0719
6 Site 0.0733 Site 0.0672 Site 0.0761
7 S 0.0798 Cool. Med. 0.0875 Ni 0.1032
8 Cool. Med. 0.1156 S 0.1061 Mn 0.1246
9 Cr 0.1344 Cr 0.1235 Cr 0.1334
10 Mn 0.1352 Mn 0.1324 Cool. Med. 0.1355
11 V 0.2375 V 0.1531 V 0.2975
12 Mo 0.3619 Mo 0.2907 Imp. Temp. 0.3017
13 C 0.3941 C 0.3049 Mo 0.4168
14 Imp. Temp. 0.5503 Size 0.6329 C 0.4500
15 Size 0.5698 Imp. Temp. 0.7339 Size 0.4648
Final T. Temp. 1.0000 T. Temp 0.9147 T. Temp 1.0000
Table 7.6: Charpy Fast-SICFIS filter method for feature selection results.
Combined Score Mag-Phase Score Covariance Score
1 Al 0.0075 Al 0.0173 Al 0.0097
2 Ni 0.0177 Ni 0.0184 Si 0.0218
3 Si 0.0199 Si 0.0284 Ni 0.0269
4 Site 0.0854 Site 0.0933 Site 0.0834
5 Mo 0.1129 Cool. Med. 0.0937 Mo 0.1014
6 Cool. Med. 0.1320 H.Temp 0.1181 S 0.1372
7 S 0.1361 Mo 0.1293 V 0.1444
8 Depth 0.2006 S 0.1399 Depth 0.1577
9 V 0.2141 Cr 0.1958 Cool. Med. 0.1739
10 H_temp 0.2170 Mn 0.2412 Mn 0.2881
11 Mn 0.2674 Depth 0.2440 H.Temp 0.3148
12 Cr 0.3156 V 0.2835 Size 0.4043
13 Size 0.3700 Size 0.3282 Cr 0.4297
14 T. Temp. 0.5523 T. Temp 0.5264 T. Temp 0.5677
15 C 0.7719 C 0.8556 C 0.6603
Final Imp. Temp. 0.9713 Imp. Temp. 0.9303 Imp. Temp. 0.9799
167
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Results of the UTS for the normalized and fast SICFIS models are shown in Table
7.7 and Table 7.8. The evaluation of each of the equations for both the normalized and
fast models is shown in Figure 7.14. The best results are obtained by the combined
equation for the fast-SICFIS model. The results from the Mag-Phase and the
Covariance equation seem to vary between different points, showing a clear advantage
of utilizing both equation for obtaining better and more robust results.
Results of the Cancer for the normalized and fast SICFIS models are shown in Table
7.9 and Table 7.10. The evaluation of each of the equations for both the normalized and
fast models is shown in Figure 7.12 Figure 7.15. In section 4.7.3, the results for the
Cancer dataset utilizing the normalized and fast-SICFIS models showed a clear
difference between both methods, being the fast-SICFIS model better suited for
modelling the Cancer dataset, therefore a poor performance of the normalized-SICFIS
model for feature selection is the results of its poor performance in prediction. The
168
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
combined equation provided the best results as shown in Figure 7.12. From the results
observed it is concluded that after Stage, Age is the most important feature for
prediction.
Table 7.7: UTS Normalized-SICFIS filter method for feature selection results.
Combined Score Mag-Phase Score Covariance Score
1 Si 0.00172 Si 0.00128 Si 0.00320
2 Al 0.00426 Al 0.00394 Al 0.00571
3 Depth 0.00777 Depth 0.00776 Depth 0.00860
4 H. Temp 0.01227 H. Temp 0.01289 H. Temp 0.01216
5 V 0.02945 Site 0.02332 V 0.02991
6 S 0.04061 V 0.02854 S 0.04336
7 Site 0.04424 S 0.03700 Site 0.06466
8 Mn 0.06885 Mn 0.04977 Mn 0.08434
9 Size 0.12031 Cool. Med. 0.05208 C 0.12399
10 Cool. Med. 0.12953 Size 0.09009 Size 0.14659
11 C 0.20376 Mo 0.21006 Cr 0.18584
12 Mo 0.29607 C 0.27420 Cool. Med. 0.20420
13 Cr 0.35418 Ni 0.27670 Mo 0.36804
14 Ni 0.42908 Cr 0.49190 Ni 0.55409
Final T. Temp 1.00000 T. Temp 0.95626 T. Temp 1.00000
Table 7.8: UTS Fast-SICFIS filter method for feature selection results.
Combined Score Mag-Phase Score Covariance Score
1 Al 0.00000 Al 0.00000 Al 0.00000
2 Si 0.00640 Si 0.00218 Si 0.01061
3 V 0.01621 Depth 0.00345 V 0.02894
4 Depth 0.02358 V 0.00348 Depth 0.04371
5 H. Temp 0.03047 S 0.00603 H. Temp 0.05384
6 S 0.03049 H. Temp 0.00710 S 0.05495
7 Cool. Med. 0.06226 Cool. Med. 0.01945 Mn 0.07376
8 Mn 0.07073 Size 0.02420 Cool. Med. 0.10508
9 Site 0.08729 Site 0.02525 Site 0.14934
10 Cr 0.11057 Cr 0.05424 Cr 0.16691
11 Size 0.13020 Mo 0.06677 Size 0.23621
12 C 0.16364 Mn 0.06769 C 0.25593
13 Mo 0.18410 C 0.07135 Mo 0.30143
14 Ni 0.22954 Ni 0.07570 Ni 0.38337
Final T. Temp 1.00000 T. Temp 1.00000 T. Temp 1.00000
169
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Table 7.9: Bladder Cancer Normalized-SICFIS filter method for feature selection
results.
Combined Score Mag-Phase Score Covariance Score
1 Cystectomy 0.0033 Cystectomy 0.0053 Cystectomy 0.0035
2 Vascular 0.0485 Vascular 0.0303 Radiotherapy 0.0288
3 Radiotherapy 0.0548 Radiotherapy 0.0676 Squamous 0.0305
4 Grade 0.0908 Grade 0.0962 Urothelium 0.0611
5 Urothelium 0.1078 Urothelium 0.1386 Vascular 0.0778
6 Nodes Detail 0.1166 Nodes Detail 0.1403 Nodes Detail 0.0862
7 Squamous 0.1686 Squamous 0.1438 CIS Present 0.1459
8 CIS Present 0.1974 CIS Present 0.1694 Muscle 0.1460
9 Muscle 0.2139 Muscle 0.2226 Sex 0.1567
10 Age 0.2584 Age 0.2601 Age 0.2416
11 Sex 0.2613 Sex 0.3349 Grade 0.3126
12 SPB 0.7550 SPB 0.8052 SPB 0.6131
Final Stage 0.9581 Stage 0.8308 Stage 1.0000
170
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Table 7.10: Bladder Cancer Fast-SICFIS filter method for feature selection results.
Combined Score Mag-Phase Score Covariance Score
1 Squamous 0.0017 Squamous 0.0021 Sex 0.0026
2 Vascular 0.0088 Vascular 0.0070 Squamous 0.0032
3 Radiotherapy 0.0089 Radiotherapy 0.0104 Radiotherapy 0.0095
4 Cystectomy 0.0104 Cystectomy 0.0125 Cystectomy 0.0104
5 Sex 0.0178 Sex 0.0171 Nodes Detail 0.0196
6 Nodes Detail 0.0243 Nodes Detail 0.0180 Vascular 0.0437
7 Grade 0.0718 Grade 0.0823 Muscle 0.0612
8 Muscle 0.1453 Muscle 0.0841 Urothelium 0.1417
9 SPB 0.1752 SPB 0.1326 CIS Present 0.1506
10 Urothelium 0.1948 Urothelium 0.1503 Grade 0.3089
11 CIS Present 0.2578 CIS Present 0.2014 Age 0.3267
12 Age 0.3222 Age 0.3193 SPB 0.3846
Final Stage 1.0000 Stage 1.0000 Stage 1.0000
171
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
The results of the combined filter-SICFIS method for the fast and normalized
SICFIS model, the wrapper-SICFIS method and the best performing Fuzzy rough set
method are plotted for comparison purposes. Some variation is expected given random
effects during training.
Results for the Charpy impact test are shown in Figure 7.13. The worse performing
method is the filter fast-SICFIS method. While the remaining methods seem to perform
equivalent and most of the difference in performance can be attributed to random errors.
The UTS results are shown in Figure 7.14. The wrapper method provided the best
results, while the rest of the methods performance deviate from the wrapper method
slightly at different points.
The Bladder Cancer results are shown in Figure 7.15. The worse performing model
is the filter normalized-SICFIS method. This is expected, given the results observed in
section 4.7.3. The rest of the results difference are attributed to random errors.
The computation time of each algorithm are shown in Table 7.11. From the
computation times an exponential increase in computational times for the UTS dataset
utilizing any of the fuzzy-rough set methods is observed. This exponential increase is
due to the UTS dataset containing twice the number of instances in comparison with
the Charpy impact dataset. For the wrapper method the number of features in the dataset
has more of an impact than the number of instances. The filter SICFIS method proposed
in this work produced the lowest computational time as expected, with a considerable
reduction in computational times.
172
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
173
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
Table 7.11: Computation time comparison between the different datasets and methods
measured in seconds (s).
Charpy UTS Cancer
Wrapper-SICFIS 289.25 s 234.43 s 121.95 s
FRS-01 101.49 s 1012.2 s 57.34 s
FRS-02 100.78 s 975.03 s 56.07 s
FRS-03 101.35 s 978.05 s 56.34 s
Filter N-SICFIS 31.26 s 32.05 s 22.05 s
Filter F-SICFIS 17.67 s 22.81 s 13.83 s
FRS: Fuzzy Rough Set, N: Normalized, F: Fast.
Given the large size of the superconductivity dataset, it is not possible to implement
the rough -sets and wrapper feature selection methods. In [109], the authors present the
20 most significant features obtained from an XG-Boost analysis results. The results
obtained from the three feature selection algorithms as well as the XG-Boost analysis
are shown in Table 7.13. In order to compare the efficacy of the feature selection
algorithms a reduced data set consisting of the 20 most significant features is utilized
174
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
for training a 5 membership function normalized and fast SICFIS models. The results
of the evaluation are shown in Table 7.12.
175
Chapter 7
Feature Selection Algorithm with Fuzzy Rough Sets and the Single Input Complex Fuzzy Inference
System
7.7 Summary
From the results obtained, the best performing algorithm in the first three datasets
was the wrapper method utilizing the fast-SICFIS model. The feature selection method
utilizing fuzzy rough sets with the first formula also produced comparable results, with
the UTS dataset outperformed by the wrapper method. The filter-SICFIS performed
comparable with the other methods, slightly decrease in performance in the UTS and
cancer dataset was observed.
Given that the demand for computational efficient code to deal with big-data, both
the fuzzy rough set and the wrapper methods are not well equipped for large dataset
such as the superconductivity dataset. The filter SICFIS-model has shown promising
results for the smaller datasets but requires additional modifications for larger datasets.
176
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency
Identification and Modelling
8.1 Introduction
The Charpy impact dataset is known to be difficult to model due to the scatter in the
dataset and inconsistencies in the measurement values [129]. Objects in an information
table are considered inconsistent when two or more objects contain the same or similar
feature values but different outputs. Inconsistencies arise either by errors in
measurement or by features not included in the information table. Rough sets can be
utilized to identify inconsistent records and to measure the degree of inconsistency in a
dataset.
This Chapter proposes an application of fuzzy rough sets for modelling under
inconsistent datasets. The modelling paradigm proposes to 1. Identify and classify
consistent and inconsistent instances present in the dataset utilizing fuzzy rough sets. 2.
Propose a method for identifying inconsistencies in a testing partition. 3. Improve upon
the results by crating different models to predict the previously identified consistent
and inconsistent partitions. 4. Generate a multiple point prediction instead of single
point to model inconsistencies and aid in the development of material design.
The consistency of an object can be measured by utilizing the positive region of the
lower approximation of a fuzzy-rough set (8.2). The feature dependency (8.1) utilized
177
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
'P (Q) =
xU
POS RP ( Q )
( x)
(8.1)
U
Table 8.1 shows an example of an inconsistent information granule. The features are
normalized, rounded and randomly selected for confidentiality reasons. The positive
region score (8.2) shown is the last column allows to identify such information granule
as inconsistent. Given that the membership value of the positive region ranges from 0
to 1 it is necessary to select a threshold value to classify objects as either consistent or
inconsistent.
1 0.05 0.25 0.50 0.44 0.02 0.31 0.23 0.03 0.35 106.204 0.29
2 0.05 0.25 0.50 0.44 0.02 0.31 0.23 0.03 0.35 173.543 0.29
3 0.05 0.25 0.50 0.44 0.02 0.31 0.23 0.03 0.35 173.543 0.30
4 0.05 0.25 0.50 0.44 0.02 0.31 0.23 0.03 0.35 61.011 0.33
5 0.05 0.25 0.49 0.44 0.02 0.31 0.23 0.03 0.35 89.9347 0.33
6 0.05 0.25 0.49 0.44 0.02 0.31 0.23 0.03 0.35 86.319 0.33
7 0.05 0.25 0.49 0.44 0.02 0.31 0.23 0.03 0.35 121.118 0.40
8 0.05 0.25 0.49 0.44 0.02 0.31 0.23 0.03 0.35 101.233 0.40
Ftr: Feature.
178
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
value of the first three real world datasets explored in this work. It can be observed that
the Cancer dataset contains by far the lowest feature dependency, followed by the
Charpy impact test, while the UTS can be considered mostly consistent.
The low feature dependency observed in the bladder cancer dataset is related to the
complex relationship and difference between different persons genetics and lifestyle,
making a prediction based on a few parameters highly difficult and random [39].
Figure 8.2 shows the effect of the number of features and a selected threshold in the
number of inconsistencies. The number of inconsistencies grows significantly with the
elimination of features, even when these features have a small impact in the prediction
accuracy as observed in the previous chapter in Figure 7.1.
179
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
Figure 8.2: Effects on the number of inconsistencies given different number of features
and different threshold values.
180
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
The KNN algorithm can be utilized for classification tasks, it classifies testing
sample based on the known class values of the k nearest samples [138]. An example of
the KNN classification is shown in Figure 8.3. Different metrics can be implemented
for finding the nearest neighbour. In this work a Euclidean distance metric is
implemented, a weighted method is implemented, in which nearest neighbours have
more impact in the decision than further neighbours, ties are resolved by the nearest
neighbour.
181
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
Table 8.3: Accuracy varying the number of features and the number of k neighbours
16 Features 14 Features 12 Features 10 Features 8 Features
k=1 89.47 89.08 86.18 87.06 86.72
k=2 89.47 89.08 86.22 87.06 86.72
k=3 89.45 89.10 86.28 86.97 86.95
k=4 89.62 89.18 86.07 86.85 86.81
k=5 90.15 88.87 86.24 86.79 86.56
k=6 89.94 88.80 86.43 86.79 86.47
k=7 90.10 88.87 86.39 86.55 86.34
k=8 89.98 88.78 86.34 86.43 86.18
k=9 89.92 88.93 86.36 86.15 85.82
k=10 89.83 88.86 86.51 86.34 85.73
182
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
(a) (b)
Figure 8.4: Effect of inconsistence in Charpy impact prediction (a) and UTS prediction
(b).
On the one hand removing inconsistent objects from the dataset may cause the loss
of valuable information, limiting the prediction capabilities of a model. On the other
hand, inconsistencies may result in unreliable models and a considerable increase in the
prediction error, as is observed in Figure 8.4. Therefore, in the presence of
inconsistencies it is proposed to implement a modelling strategy, which considers the
inconsistencies present in the dataset and perform predictions accordingly. Instead of
providing a single point prediction, a set of predictions are to be presented in regions
estimated to contain inconsistencies.
Two or more instances are considered inconsistent when contain the same or very
similar feature values and different outputs. These inconsistencies result in a large
portion in the error prediction, nonetheless, contain valuable information, given that the
183
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
inconsistencies do not arise due to errors in measurements, but for the lack of
information. This was confirmed with the observed increase in inconsistencies with the
removal of features.
In the case of the Charpy impact test, it is well known the considerable amount of
inconsistencies in measurements. Some of these inconsistencies may be attributed to
inhomogeneities in the microstructure [139], or other features difficult or non-cost
efficient to measure.
Initially, the inconsistencies are identifying utilizing the positive region of the fuzzy-
rough sets, calculating utilizing the fuzzy similarity equation (7.1). The consistent
instances are added to a set C, the inconsistent instances are divided into N different
trained utilizing the consistent partition and each one of the inconsistent partitions In .
The process is summarized in Algorithm 8.1.
8.4.1 Results
184
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
used for performing the predictions. A 1 K-NN algorithm is performed in the testing
partition to identify inconsistencies.
Algorithm 8.1: Data selection for training M SICFIS models to perform the multiple
point prediction.
Inputs: Charpy impact dataset H, Threshold Thr
Output: Set containing consistent elements C, set containing
inconsistent elements I, set of M trained SICFIS models
C, I =
Calculate POS RP
(Q) for all the elements in H
For j = 1: H
If POS RP
(Q)
(hj ) Thr : C = C {hj }
Else: I = I {hj }
Create a KNN model with C and I
Train SICFIS1 with C
Create N clusters from inconsistent set I; I = {Ic1 , Ic2 ,..., Icn }
For j = 1: N
Train SICFISj+1 with C Ic j
End
The results for the consistent and inconsistent partitions are shown in Figure 8.5 an
Figure 8.6 respectively. It can be observed a greater gap between the benchmark and
the prediction intervals for the inconsistent testing partition. Table 8.4 shows the mean
gap in prediction measured using the RMSE index. Furthermore, it is observed from
Figure 8.6 both the benchmark model and the intervals seem to be unable to perform
proper predictions to the inconsistent testing partition.
Table 8.4: Mean absolute prediction difference between the prediction interval for the
consistent and inconsistent partitions
Mean prediction interval absolute difference
Inconsistent Testing partition 29.31 RMSE
Consistent Testing Partition 13.95 RMSE
185
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
Figure 8.5: Charpy Impact test prediction interval for consistent testing partition.
Figure 8.6: Charpy Impact test prediction interval for inconsistent testing partition.
186
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
It was shown in Table 8.2 that the Cancer dataset contained the worse score in feature
dependency, meaning that most of the records are inconsistent. This is well known in
medicine, given that the different effects of lifestyle and genetics make it almost
impossible to obtain consistent results. Utilizing a Threshold, it was selected the most
consistent data points.
A summary of the results is shown in Table 8.5, the consistent partition consists of
97 patient records. Most of such records contain patients whose time of death was
within the first five years. As observed by the mean observed time, being 10 months.
The age, and grade means are also superior to the average, while the stage seems to be
below the average.
187
Chapter 8
Fuzzy Rough Sets for Data-mining: Inconsistency Identification and Modelling
8.6 Summary
In this work a method for evaluating the consistency of a dataset utilizing fuzzy
rough set was implemented for data-mining. The feature dependency was shown to
measure the average consistency of a dataset. Inconsistencies are the result of instances
that contain the same or similar input values and exhibit different outputs.
Additionally, fuzzy rough sets can be used to identify consistencies in the dataset as
it was the case in the Cancer dataset, where it is possible to determine which parameter
values produces more consistent results. Such information can be used by a medical
professional for evaluating the life expectancy of a patient.
188
Chapter 9
Conclusions and Future Work
Chapter 9
Conclusions and Future Work
9.1 Conclusions
Among the research realized in the topic of CFS worldwide only three research
groups have focused on the development of CFISs, resulting in the development of the
ANCFIS, CNFIS and ACNFIS. Neither the ACNFIS nor the CNFIS model exploit the
property of interference, which according to Ramot, is the main property of CFS.
Furthermore, both models (CNFIS and ACNFIS) ignore, for the most part, the effect
and meaning of the imaginary component of the output. It can be concluded that neither
one of these two models are adequate CFISs and should be considered instead as
modifications to the real-valued ANFIS. The ANCFIS model, however, utilizes the
complex component of the CFS to model interferences by using a dot product operation.
ANCFIS was developed for time series applications showing promising results.
Regardless, none of the research groups have adequately addressed the problem of
interpretability, the raison d’etre of fuzzy logic.
The SICFIS model introduced in Chapter 4 is therefore the first interpretable CFIS
hitherto proposed. The SICFIS exploits the property of interference to model the
complex interaction between features and outputs, resulting in a parsimonious model
framework. The expansion to the complex domain presents several advantages over
traditional FIS, including a higher prediction accuracy, faster computation times and
greater interpretability given the number of tools capable of extracting and representing
knowledge. The magnitude-phase plots demonstrate the full transparency; the
interpretability analysis performed for the Charpy impact test demonstrated its
interpretability. Both the normalized and fast SICFIS models outperformed most of the
189
Chapter 9
Conclusions and Future Work
FIS for different applications, and the choice of one over the other one is problem
dependent, as was observed in the Bladder Cancer results, where the fast-SICFIS
outperformed the normalized-SICFIS. This, in fact, can be attributed to the number of
categorical variables present in the dataset.
Given the fast-SICFIS considerable reduction in computational time and the simple
structure it was possible to improve upon the ANFIS model, by replacing the linear
consequents with SICFISs models. The premises create a partition in the feature space,
where each rule represents a local model. The global model is therefore composed of
an ensemble of interpretable local SICFISs. The performance obtained is comparable
with those obtained by a large ensemble of ANN. The interpretability of the model was
assessed with a global-local performance index in all four datasets. Given the large
number of categorical variables present in the Bladder Cancer dataset, there was a
decrease in performance compared with the SICFIS.
The knowledge extracted from the SICFIS model may potentially be utilized for
further applications. In Chapter 7 a feature selection algorithm is developed, based on
the complex valued information obtained from the SICFIS output. The filter-SICFIS
method assigns a score to each of the features based on their importance. The algorithm
190
Chapter 9
Conclusions and Future Work
Fuzzy rough sets have been mostly utilized for feature selection. In Chapter 8 fuzzy
rough sets are implemented into the Charpy and Bladder Cancer datasets. Both datasets
present tough challenges given the number of inconsistencies, that where identified
from the positive region of the lower approximation of the fuzzy rough sets. It was
demonstrated that the prediction errors can be attributed greatly to the presence of
inconsistencies.
Overfitting was observed in the ANFIS-SICFIS model with the addition of rules. In
order to improve upon the results and reduce overfitting, the implementation of
regularization strategies may potentially solve this problem, while maintaining a good
global-local performance. The implementation of better methods for rule elicitation
may improve the results obtained even further. For datasets containing a large number
of categorical variables further research needs to be conducted, such as the
implementation of hyperparameter optimization.
191
Chapter 9
Conclusions and Future Work
type-2 strategies the system may potentially model uncertainties and improve upon the
results.
192
References
References
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Delving Deep Into Rectifiers: Surpassing
Human- evel Performance On Imagenet Classification,” In 2015 IEEE
International Conference On Computer Vision (ICCV), 2015, pp. 1026–1034.
[2] P. Mamoshina, A. Vieira, E. Putin, and A. Zhavoronkov, “Applications Of Deep
earning In Biomedicine,” Mol. Pharm., vol. 13, no. 5, pp. 1445–1454, May 2016.
[3] J. B. Heaton, N. G. Polson, and J. H. Witte, “Deep earning In Finance,”
Arxiv160206561 Cs, Feb. 2016.
[4] M. Kaminski, “The Right To Explanation, Explained,” Berkeley Technol. Law J.,
vol. 34, no. 1, P. 189, May 2019.
[5] . A. Zadeh, “Fuzzy Sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, Jun. 1965.
[6] E. H. Mamdani, “Application Of Fuzzy Algorithms For Control Of Simple
Dynamic Plant,” Proc. Inst. Electr. Eng., vol. 121, no. 12, pp. 1585–1588, Dec.
1974.
[7] J. M. Alonso and . Magdalena, “Special Issue On Interpretable Fuzzy Systems,”
Inf. Sci., vol. 181, no. 20, pp. 4331–4339, Oct. 2011.
[8] D. Ramot, R. Milo, M. Friedman, and A. Kandel, “Complex Fuzzy Sets,” IEEE
Trans. Fuzzy Syst., vol. 10, no. 2, pp. 171–186, 2002.
[9] D. Dubois and H. Prade, “Putting Rough Sets and Fuzzy Sets Together,” In
Intelligent Decision Support: Handbook Of Applications and Advances Of The
Rough Sets Theory, R. Słowiński, Ed. Dordrecht: Springer etherlands, 199 , pp.
203–232.
[10] Jang , J.S.R., Sun, C.T, and Mizutani, E., Neuro-Fuzzy and Soft Computing; A
Computational Approach To Learning and Machine Intelligence. Prentice Hall,
1997.
[11] T. J. Ross, Fuzzy Logic With Engineering Applications, 3rd Ed. Chichester,
U.K: John Wiley, 2010.
[12] O. Nelles, Nonlinear System Identification: From Classical Approaches To
Neural Networks and Fuzzy Models. Berlin; London: Springer, 2011.
[13] J. Espinosa, J. Vandewalle, and V. Wertz, Fuzzy Logic, Identification, and
Predictive Control. ondon ; ew York: Springer, 004.
[14] T. Takagi and M. Sugeno, “Fuzzy Identification Of Systems and Its
Applications To Modeling and Control,” no. 1, P. 17, 1985.
[15] N. Yubazaki, J. Yi, M. Otani, and K. Hirota, “Sirm’s Connected Fuzzy
Inference Model and Its Applications To First-Order Lag Systems and Second-
Order ag Systems,” In Soft Computing In Intelligent Systems and Information
Processing. Proceedings Of The 1996 Asian Fuzzy Systems Symposium, 1996, pp.
545–550.
193
References
194
References
195
References
196
References
197
References
Information Processing Society (NAFIPS) Held Jointly With 2015 5th World
Conference On Soft Computing (Wconsc), 2015, pp. 1–6.
[82] O. Yazdanbakhsh and S. Dick, “Forecasting Of Multivariate Time Series Via
Complex Fuzzy ogic,” IEEE Trans. Syst. Man Cybern. Syst., vol. 47, no. 8, pp.
2160–2171, Aug. 2017.
[83] O. Yazdanbakhsh and S. Dick, “ANCFIS-ELM: A Machine Learning
Algorithm Based On Complex Fuzzy Sets,” 01 , pp. 2007–2013.
[84] C. Li and T.-W. Chiang, “Intelligent Financial Time Series Forecasting: A
Complex Neuro-Fuzzy Approach With Multi-Swarm Intelligence,” Int. J. Appl.
Math. Comput. Sci., vol. 22, no. 4, pp. 787–800, Dec. 2012.
[85] C. Li and T.-W. Chiang, “Complex eurofuzzy Arima Forecasting—A New
Approach Using Complex Fuzzy Sets,” IEEE Trans. Fuzzy Syst., vol. 21, no. 3, pp.
567–584, Jun. 2013.
[86] C. Li and F. Chan, “Complex-Fuzzy Adaptive Image Restoration – An
Artificial-Bee-Colony-Based earning Approach,” In Intelligent Information and
Database Systems, 2011, pp. 90–99.
[87] C. Li and T.-W. Chiang, “Complex Fuzzy Computing To Time Series
Prediction — A Multi-Swarm PSO earning Approach,” In Intelligent Information
and Database Systems, 2011, pp. 242–251.
[88] C. Li and F.-T. Chan, “Knowledge Discovery By An Intelligent Approach
Using Complex Fuzzy Sets,” In Intelligent Information and Database Systems, vol.
7196, J.-S. Pan, S.-M. Chen, and N. T. Nguyen, Eds. Berlin, Heidelberg: Springer
Berlin Heidelberg, 2012, pp. 320–329.
[89] C. Li, T.-W. Chiang, and L.-C. Yeh, “A ovel Self-Organizing Complex
Neuro-Fuzzy Approach To The Problem Of Time Series Forecasting,”
Neurocomputing, vol. 99, pp. 467–476, 2013.
[90] C. Mencar and A. M. Fanelli, “Interpretability Constraints For Fuzzy
Information Granulation,” Inf. Sci., vol. 178, no. 24, pp. 4585–4618, Dec. 2008.
[91] Z. C. ipton, “The Mythos Of Model Interpretability,” Arxiv160603490 Cs Stat,
Jun. 2016.
[92] A. Riid, R. Isotamm, and E. Rüstern, “Transparency Analysis Of First-Order
Takagi-Sugeno Systems,” P. 7.
[93] M. J. Gacto, R. Alcalá, and F. Herrera, “Interpretability Of inguistic Fuzzy
Rule-Based Systems: An Overview Of Interpretability Measures,” Inf. Sci., vol.
181, no. 20, pp. 4340–4360, Oct. 2011.
[94] T. R. Razak, J. M. Garibaldi, C. Wagner, A. Pourabdollah, and D. Soria,
“Interpretability Indices For Hierarchical Fuzzy Systems,” In 2017 IEEE
International Conference On Fuzzy Systems (Fuzz-IEEE), 2017, pp. 1–6.
[95] J. M. Alonso, C. Castiello, and C. Mencar, “Interpretability Of Fuzzy Systems:
Current Research Trends and Prospects,” In Springer Handbook Of Computational
Intelligence, Springer, Berlin, Heidelberg, 2015, pp. 219–237.
198
References
199
References
[113] J. H. Lilly, Fuzzy Control and Identification. Hoboken, Nj, USA: John Wiley &
Sons, Inc., 2010.
[114] Q. Zhang and M. Mahfouf, “Fuzzy Modelling Using A ew Compact Fuzzy
System: A Special Application To The Prediction Of The Mechanical Properties
Of Alloy Steels,” 011, pp. 1041–1048.
[115] A. Botev, H. Ritter, and D. Barber, “Practical Gauss-Newton Optimisation For
Deep earning,” Arxiv170603662 Stat, Jun. 2017.
[116] M. T. Hagan and M. B. Menhaj, “Training Feedforward etworks With The
Marquardt Algorithm,” IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989–993, Nov.
1994.
[117] R. Muscat and M. Mahfouf, “Predicting Charpy Impact Energy For Heat-
Treated Steel Using A Quantum-Membership-Function-Based Fuzzy Model,”
IFAC-Pap., vol. 49, no. 20, pp. 138–142, 2016.
[118] Shen Wang and M. Mahfouf, “Multi-Objective Optimisation For Fuzzy
Modelling Using Interval Type- Fuzzy Sets,” 01 , pp. 1–8.
[119] J. R. Davis, Ed., Alloying: Understanding The Basics. Materials Park, Ohio:
Asm International, 2011.
[120] G. R. Speich, D. S. Dabkowski, and . F. Porter, “Strength and Toughness Of
Fe-10ni Alloys Containing C, Cr, Mo, and Co,” Metall. Trans., vol. 4, no. 1, pp.
303–315, Jan. 1973.
[121] H. Takagi, N. Suzuki, T. Koda, and Y. Kojima, “ eural etworks Designed On
Approximate Reasoning Architecture and Their Applications,” IEEE Trans. Neural
Netw., vol. 3, no. 5, pp. 752–760, Sep. 1992.
[122] E. Mizutani and J.-R. Jang, “Coactive eural Fuzzy Modeling,” In Proceedings
Of Icnn’95 - International Conference On Neural Networks, 1995, vol. 2, pp. 760–
765 vol.2.
[123] R. Rajesh and M. R. Kaimal, “T–S Fuzzy Model With Nonlinear Consequence
and Pdc Controller For A Class Of onlinear Control Systems,” Appl. Soft
Comput., vol. 7, no. 3, pp. 772–782, Jun. 2007.
[124] A. Sala and C. Ariño, “Polynomial Fuzzy Models For onlinear Control: A
Taylor Series Approach,” IEEE Trans. Fuzzy Syst., vol. 17, no. 6, pp. 1284–1295,
Dec. 2009.
[125] K. Tanaka, H. Yoshida, H. Ohtake, and H. O. Wang, “A Sum-Of-Squares
Approach To Modeling and Control Of Nonlinear Dynamical Systems With
Polynomial Fuzzy Systems,” IEEE Trans. Fuzzy Syst., vol. 17, no. 4, pp. 911–922,
Aug. 2009.
[126] J. Dong, Y. Wang, and G. Yang, “Output Feedback Fuzzy Controller Design
With Local Nonlinear Feedback Laws For Discrete-Time onlinear Systems,”
IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 40, no. 6, pp. 1447–1459, Dec.
2010.
[127] M. Delgado, A. F. Gomez-Skarmeta, and F. Martin, “A Fuzzy Clustering-Based
Rapid Prototyping For Fuzzy Rule-Based Modeling,” IEEE Trans. Fuzzy Syst., vol.
5, no. 2, pp. 223–233, May 1997.
200
References
201