34 1 117
34 1 117
An artificial neural network is a densely inter- function of a neuron in layer i is simply to multiply
connected array of simple processing elements the inputs from neurons in layer i - 1 by the weights
(referred to as neurons herein) arranged in an associated with the respective synapses, add a bias
input layer, an output layer, and (usually) one or term, and then operate on the result with a
more 'hidden layers' that have neither direct inputs mathematical transfer function to obtain the output
from outside the network nor direct outputs beyond of that neuron to neurons in layer i+1. If, for
the network, all designed to simulate the learning example, the inputs from layer i - 1 to a given
and recall functions of analogous structures in the neuron in layer i are xl, x2,...,xn (Fig. 2), these are
brain (Fig. 1). For the purposes of this study, the multiplied by their respective synaptic weights Wl,
inputs to the neural network will be numerical w2,...,wn, summed with a bias (b), and then operated
values derived from (simulated) XRD patterns of on by a transfer function to determine the output
clay-mineral mixtures, and the desired outputs will from that neuron to neurons in layer i+1. The
be the weight fractions of the constituents of those transfer function may be linear, sigmoidal, or any
mixtures. other type of smooth (differentiable) function. The
The connections between neurons in successive range in the desired output largely determines the
layers are called synapses another obvious choice of transfer function; for example, if outputs
biological analogy and the synapses have in the range 0 to 1 are desired, a log-sigmoid
adjustable weights associated with them. The transfer function is appropriate (i.e. a smoothed step
\ 3
Input Input Hidden Output Results
data layer layers layer
FIG. 1. Generalized architecture of a neural network, in a back-propagation network, as used in this study,
information flows only from left to right between neurons during training, until outputs are compared with target
values, and corrections are 'back-propagated' through the network.
Y2\ / Yl
x~ [gl2N~,~ / ~ Yz
~' /~ ~ i~ ~ I ~ ~i
/ v~ \
x. One neuron Ym
n inputs n weights m outputs to
and biases m neurons of
the next layer
FIG. 2. The structure of a neuron, or processing element, of an artificial neural network, x l.,,xn represent inputs,
and Y v . . Y , represent outputs. The transfer function is described in the text.
Phase analysis of clay minerals 119
function on the interval {0,1}). This is the case MATLAB Neural Network Toolbox, although
here, of course, inasmuch as outputs are to be several commercial packages for neural computing
weight fractions in a mixture. are available, and no endorsement of any particular
Because so-called back-propagation neural software is intended.
networks with a sigmoidal hidden layer are
capable of closely approximating any non-linear
METHODS
function that has a finite number of discontinuities
(Demuth & Beale, 1992), that type of architecture With a back-propagation neural network (also
was chosen for this study. In a back-propagation known as a feed-forward neural network), learning
network, each neuron in layer i (including the input is supervised, in that both the training input and the
layer) is connected to every neuron in layer i+1 desired results (target data) are provided to the
(including the output layer), as indicated schema- network through many cycles of training.
tically in Fig. 1. Information flows in the direction from the input
Artificial neural networks are particularly adept buffer to the output buffer; that is, there is no
at solving problems involving pattern recognition or bidirectional flow or feedback, until the final output
function approximation, and can be quite successful based on one cycle through the training input is
in spite of noisy data. For these reasons they have compared with the desired output. Weights for each
been extensively employed in monitoring industrial synapse are set randomly at first, so the initial
processes for years. The relatively few applications outputs are far from the target data. The network is
in geology have involved such diverse fields as repeatedly presented with the sets of training input
remote sensing (e.g. Bischoff et al., 1992), well-log and target data, and in each training cycle (or
analysis (Huang et al., 1996), and the classification epoch) the errors (differences between outputs and
of rocks (Carr & Hibbard, 1991). Griffen et al. target data) are used by an algorithm external to the
(1995) have given a progress report on the use of network to adjust the weights associated with each
artificial neural networks for the modal analysis of neuron, in order to produce output data more
igneous rocks using X-ray powder diffraction data. closely matching the target data. (The name
The identification and quantitative analysis of 'back-propagation' comes from the fact that
mixtures and interstratifications of clay minerals information used to adjust the weights is propagated
using X-ray powder diffraction has recently back down the network after each complete training
involved the modelling of theoretical diffraction cycle.) When the sum of the squared errors reaches
patterns for matching in some way against the some predetermined value deemed to be acceptable,
observed diffraction patterns. For example, Bish the network is said to have converged on the
(1993) has used the Rietveld method for well solution, and the weights are frozen. Thereafter the
crystallized clay minerals; Pevear & Schuette trained network should provide reasonable results
(1993) have applied a genetic algorithm to the when presented with similar data not used in the
trial-and-error method of matching a mixture of training set.
four specific clay minerals to a diffraction pattern; One of the most common ways of adjusting the
Jones (1989) has developed a computerized curve- synaptic weights is the method of gradient descent,
fitting/peak-decomposition method. Among the wherein the weights (and biases) are shifted in a
most widely used software packages for simulating direction opposite to the gradient in the error
XRD patterns of mixed-layer clays (as well as of surface. The error is represented by
pure clays, and for approximating the patterns of E = 0.5~(zj - yj)2 (1)
non-interstratified mixtures) are NEWMOD 9
(Reynolds, 1985) and NEWMOD2 9 (Reynolds & where zj and yj are the jth target and output values
Reynolds, 1987). Walker (1993) has described the in the target and output vectors z and y
software in detail. corresponding to an input vector x. The error
The purpose of this paper is to demonstrate the surface comprises the error values for all possible
potential for using neural networks for the input vectors and all possible weights. The changes
quantitative phase analysis of clay-mineral in the weights are calculated by the gradient
mixtures, given high quality X-ray powder diffrac- descent algorithm, involving the input, output, and
tion data. Both preprocessing and neural network target values. This results in convergence, but is
calculations were done with MATLAB and the often very slow. A more sophisticated method of
120 D. T. GriffOn
error minimization is the Levenberg-Marquardt exchange cation. Powder patterns for the three pure
approximation: phases were calculated with N E W M O D 2 9
AW = ( j T j + pj) 1jTe (2) (Reynolds & Reynolds, 1987), assuming the
geometry of a Scintag XDS-2000 theta-theta X-ray
where AW is the change in the weight vector, J is diffractometer and 'infinitely thick' clay samples of
the Jacobian matrix of derivatives of each error 2.5 cm length. These choices were made because the
with respect to each weight, ~t is a scaler, and e is next phase of this study will involve real XRD
the error vector (Demuth & Beale, 1992). If ~t is patterns, which will be collected with such an
large, the method approaches gradient descent; if g instrument. The diffraction patterns are shown in
is small, it becomes the well known Gauss-Newton Fig. 3. The calculation of powder patterns of
method of finding a minimum in the error-gradient mixtures of the three phases was done by
surface (Prince, 1994). During training, p. is shifted straightforward linear combination of the patterns of
so as to approach the Gauss-Newton method as the pure phases. Inasmuch as the purpose of this
quickly as possible, resulting in rapid convergence. paper is to investigate the potential of neural
Further discussion of network architecture, error computing in the quantitative phase analysis of clay
analysis, and related topics, can be found in De mineral mixtures, rather than to consider how
Wilde (1997), Chester (1993), Carling (1992), and accurately NEWMOD2 9 can be made to model
many other books on the subject. real, non-interstratified, multi-clay mixtures, differ-
The three model clays used in this study were ential absorption (matrix effects) caused by different
kaolinite, illite with 0.6 K atoms per formula unit, mass absorption coefficients was ignored - - a
and Fe-free smectite with one water layer, a cation simplification that will be re-evaluated in succeeding
exchange capacity of 0.36 Eq/100 g, and Mg as the studies involving real clay data.
9000 i
S
8000
S = Smectite
7000 K = Kaolinite
60001 I = Illite
.~;~5000
4000
S
300O
K
2000
I
1000
0 I
/ i
0 5 10 15 20 25 30 35 40 45
20
FIG. 3. Calculated X-ray powder diffraction patterns for the three model clay minerals used. Cu-K~ radiation is
assumed.
Phase analys~ of clay minerals 121
In general, the number of neurons needed in a values are determined by interpolation between the
hidden layer for a given multilayer network is not measured intensity values.
known, but is determined by trial and error. It is (3) The CCF is computed by multiplying the
clear, however, that the larger the number of inputs intensity at sin0i by the intensity at 2sin0i for all
and outputs, the greater the required number of i, and plotting the resulting values as a function of
neurons (De Wilde, 1997). In addition, networks d-spacing. Only intensities at positions where
with too many neurons have a high probability of sin0<0.5(sin0max) are used; for instance, if the
being over-fit that is, of predicting the values in powder pattern is run to 40~ then the calculation
the target data of the training set very well, but for CCF is carried out only to sin0 values
oscillating wildly between the values on which they corresponding to 20~ Because all clays have
were trained. In order to minimize the input data maximum basal spacings at relatively low Bragg
while maintaining a high information content, each angles, all will have at least two X-ray peaks at
clay in a mixture was represented by a single datum 20<40 ~.
that combined information from the first two While this calculation may seem involved, it can
diffraction maxima; this is obtained from a function be carried out easily by computer as an essentially
here referred to as the 'clay characterization automatic process.
function' (CCF). It is well known that successive The CCFs for the three model clays used in this
peaks in the XRD pattern obtained from an oriented study are shown in Fig. 5. Note that the peaks
clay specimen are separated by approximately equal corresponding to each clay are easily identified
distances when intensity is plotted as a function of from their basal d-values. These powder patterns
20. Because 20 is not linearly related to d-value, were calculated to 42~ and both smectite and
however, the peaks are not exactly equally spaced illite have four peaks within that range, resulting in
(see Fig. 4a). If the pattern is plotted as a function two peaks on the CCF curves for each of them; the
of sin0 instead of 20, the distances between kaolinite CCF curve has only one peak in that
successive peaks are precisely equal, and inversely range. To minimize the number of input data, only
proportional to d-values. The CCF is calculated as one peak was used for each mineral - - the one
follows: plotted at the maximum d-value, because it yielded
(1) The raw X-ray data are corrected for back- the highest CCF peak.
ground and the K~2 X-ray component is stripped
from each peak. (This, of course, is irrelevant to the
RESULTS
synthetic powder patterns used here, but very
necessary for real samples, for which a background Initially, only the CCFs for mixtures lying on the
and Ka2 contribution are present.) If the very-low- edges and corners of the smectite-kaolinite-illite
angle intensity from the incident X-ray beam is not compositional triangle were used in the training set
adequately subtracted by the background correction, (Fig. 6). The CCFs were calculated for binary
then this is done manually. Because diffraction compositions at intervals of 5 wt%, yielding 60
patterns are not usually started at 20 - 0 ~ but the values in the training set. The network was
data must begin there in order to take advantage of designed with one hidden layer, the Levenberg-
the equal spacing of basal reflections, the 20 and Marquardt training algorithm was used, and several
intensity vectors are extended backwards from the networks containing from 5 - 4 0 neurons in the
beginning of the collected data to 0~ by prefixing hidden layer were tested. Networks with <10
the appropriate diffraction angles to the 20 vector, neurons in the hidden layer did not converge.
and prefixing an intensity value of zero to the Those with between 11 and 15 neurons in the
intensity vector for each 20 value so added. hidden layer were found to be underfitted; that is,
(2) The 20 values are replaced by corresponding they converged, but reproduced the compositions
sin0 values. Because 20 and sin0 are not linearly used in the training set poorly. Those with >20
related, the intervals of sin0 obtained at this stage were overfitted; that is, they reproduced the values
are not equal, and this must be remedied. To do so, in the training set very accurately, but were poor at
a convenient fixed interval of sin0 is chosen (for predicting compositions between those in the
instance, the mean difference between successive training set. A network with 17 neurons in the
sin0 values for the entire diffraction pattern), and hidden layer and log-sigmoid transfer functions
the intensity values corresponding to these new sin0 gave good convergence (i.e. a relatively rapid
122 D. T. GriffOn
3000 i i i i
A2 0 1 . A2 0 2 :~A2 0 3
2500
.2000
~ 1500
.9
A201 A202 A2@3
1000
500
0 o,
0 10 15 20 125 10 15 40 45
2O
3000
A d 1= = Ad 3
2500
2000
.4.a
o~
.4,,,,a 1500
Adl Ad 2 Ad 3
4
1000
500
0 I . ~ ,. I I 2 I
x 106
S
10
K = Kaolinite
9
I = Illite
8 S = Smectite
~a.,6
K
r
~5
(s)
4
0
,(i) j,L j, \
0 4 6 8 10 12 1'4 1'6 l'8 20
d
FIG. 5. The clay characterization functions (CCFs) for the three model clays used.
decrease in the sum-squared errors between by definition, do not include those used in the
predicted and true compositions), and good predic- training set) except for mixtures containing <5 wt%
tions for binary compositions in the test set (which, of one phase.
With the edges of the compositional triangle
modelled well, CCFs for 10 compositions within
S the triangle were added to the training set (Fig. 6).
o Training set The same architecture was used, and the conver-
gence history is shown in Fig. 7; convergence to a
sum-squared error <0.001 was achieved in 648
training epochs (i.e. presentations of the training set
to the network), which required -12 min on a
PowerMacintosh 8500/120. The fit to the training
data, even though only a small percentage of the
compositions were ternary, is excellent (Fig. 8). In
order to test the network, it was used to predict the
compositions of a test set of 27 mixtures (as
required, none of them part of the training set); of
these, 12 were binary mixtures and 15 were ternary.
Figure 9 shows the agreement between predicted
K and true compositions. With the exception of one
sample (symbols enclosed in dashed rectangles), the
F~c,. 6. Concentrations of smectite, kaolinite, and illite agreement is quite good; no explanation for the
used in the training and test sets. single exceptional sample is apparent. Because
124 D.T. Griffen
10 2
101
~=1001
r/3
~ 10-2
r/l
1003
10~
0 100 260 300 460 500 600
Epoch
FIG. 7. The convergence history o f the final neural network. Convergence was achieved in 648 training epochs.
1.0
= 0.9
o
"m + Smectite
0.8
x Kaolinite
"-' 0.7 o Illite
=
9 0.6
0.5
=
o 0.4
9 Fit o f neural n e t w o r k
o 0.3
for training data
0.2
~0.1
0~
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
True concentration (wt fraction)
FIG. 8. The fit o f the predicted wt fractions as a function of the true concentrations o f the three model clay
minerals for the training data.
Phase analysis of clay minerals 125
1.0 i i i r
:..qi
o= 0.9 + Smectite
0.8 x Kaolinite
o,.o
~: 0.7 o Illite 9
o 0.6
4
0.5 J 9
= 0.4
9
,~ 0.3
• 2 )<
using a previously trained neural network for clays. A second question of significance for the
prediction does not involve iteration, the computa- practical application of the method is whether the
tional time involved is trivial. entire procedure can be automated to make it
convenient for clay analysts who do not wish to
become enmeshed in the details of CCF calcula-
DISCUSSION AND CONCLUSIONS
tions, network architecture, etc.
This study has demonstrated the feasibility of using Real clays present some problems not encoun-
an artificial neural network for the quantitative tered in the present work. As in any method
phase analysis of mixtures of three clays. Extension involving XRD of clay minerals, sample prepara-
to more than three clays is straightforward in tion is of utmost importance, and one of the initial
principle, although the addition of many (or challenges will be to ensure that NEWMOD2 9
potentially many) clays to the problem may properly models peak intensities and peak shapes
require a neural network with different architecture for a variety of real clays prepared using
(say, two hidden layers, or something other than a standardized methods. In addition, there may be
back-propagation network). Work to investigate that other variables available from the CCF that would
is underway. It is clear that a denser distribution of be useful, besides peak height. For calculated
ternary mixtures in the training set would have powder patterns, using the CCF provides no
improved the agreement shown in Fig. 9, but this advantage over using the heights of XRD maxima,
was unnecessary for the purpose of demonstrating because the calculations take no account of the
feasibility. The important question now is not sample preparation problems or random errors
whether performance for calculated clay mixtures which plague clay analysts. For real clays,
can be improved, but whether the method devel- however, the use of the CCF should tend to
oped with data from calculated powder patterns can 'smooth out' random errors in X-ray peak
be applied with equal efficacy to mixtures of real intensities and provide information from two
126 D. 1". Griffon
peaks rather than just one. In addition, because non- Chester M. (1993) Neural Networks, A Tutorial
layer minerals do not present peaks at unifman Prentice-Hall, Englewood Cliffs, New Jersey.
spacings, peaks due to non-clay minerals in the clay Demuth H. & Beale M. (1992) Neural Network Toolbox
fraction will be eliminated by use of the CCF. for use with MATLAB. The MathWorks, Natick, MA.
De Wilde P. (1997) Neural Network Models. Second
In addition, the use of other information available
Edition. Springer-Verlag, London.
from the CCF (e.g. using two CCF peaks per clay
Griffen D.T., Griffen B.T. & Secrest C.D. (1995)
mineral and peak widths and positions, as well as
Toward an artificial neural network for the modal
heights) might make it possible to add additional analysis of rocks from X-ray diffraction data. Geol.
functionality to the neural network. As an example Soc. Am. Ann. Meet., Abstracts' with Prog. 27, A-195.
of this type of enhancement, it might prove possible Huang Y., Wong P.M. & Gedeon T.D. (1996) An
to add the Fe content of smectites or the K content Improved Fuzzy Neural Network for Permeability
of illites as a variable to be estimated by the neural Estimation fiom Wireline Logs in a Petroleum
network. The quantitative characterization of Reservoir. Proc. IEEE Region Ten Conj.
mixed-layer clays may also be amenable to analysis (TENCON) on Digital Signal Proc. Appl., 2,
by neural computing, although the CCF will 912-917.
probably not be useful in that case, and some Jones R.C. (1989) A computer technique for X-ray
architecture other than back-propagation may yield diffraction curve fitting/peak decomposition. Pp.
superior performance for that very complex 52 101 in: Quantitative Mineral Analysis of Clays,
problem. (P.R. Pevear & F.A. Mumpton, editors), CMS
Workshop Lectures, 1, The Clay Minerals Society,
Boulder, CO.
ACKNOWLEDGMENTS Pevear D.R. & Schuette J.F. (1993) Inverting the
NEWMOD9 X-ray diffraction forward model for
Very helpful reviews of the manuscript were provided
clay minerals using genetic algorithms. Pp. 19-41
by Jeff Walker and Steve Hillier. In addition, the
in: Computer Applications to X-ray Powder
author benefited from interaction with clay scientists
Diffraction Analysis of Clay Minerals, (J.R. Walker
convened at the Macaulay Laud Use Research Institute,
& R.C. Reynolds, Jr., editors) CMS Workshop
Aberdeen, Scotland, for the Golden Jubilee Meeting of
Lectures, 5, The Clay Minerals Society, Boulder,
the Clay Minerals Group of the Mineralogical Society
of Great Britain and Ireland in April 1997, where an CO.
early version of this work was reported. Prince E. (1994) Mathematical Techniques in
Crystallography and Materials Science. Springer-
Verlag, Berlin.
REFERENCES Reynolds R.C. (1985) NEWMOD ~, a Computer
Program Jor the Calculation o.1"Basal X-ray
Bischof H., Schneider W. & Pinz A.J. (1992)
DifJ?action Intensities of Mixed-Layered Clays'. 8
Multispectral Classification of Landsat hnages
Brook Road, Hanover, NH 03755, USA.
Using Neural Networks. IEEE Trans. Geosc. Rem.
Reynolds R.C. Jr. & Reynolds R.C. III (1987)
Sens. 30, 482-490.
Bish D.L. (1993) Studies of clays and clay minerals
Description of program NEWMOD2Jbr the calcula-
using X-ray powder diffraction and the Rietveld tion of the one-dimensional X-ray diJ]raction pat-
method. Pp. 79 121 in: Computer Applications to terns of mixed-layered clays. 8 Brook Road,
35-ray Powder DfJraction Analys'is of Clay Minerals, Hanover, NH 03755, USA.
(J.R. Walker & R.C. Reynolds, Jr., editors) CMS Walker J.R. (1993) An introduction to computer
Workshop Lectures, 5, The Clay Minerals Society, modeling of X-ray diffraction patterns of clay
Boulder, CO. minerals: A guided tour ofNEWMOD ;c. Pp. I 17
Carling A. (1992) Introducing Neural Networks. Sigma in: Computer Applications to X-ray Powder
Press, Cheshire, England. Diffraction Analysis" of Clay Minerals', (J.R. Walker
Cart J.R. & Hibbard M.J. (1991) Open-ended miner- & R.C. Reynolds, Jr., editors) CMS Workshop
alogical/textural rock classification. Comp. Geosc. Lectures, 5, The Clay Minerals Society, Boulder,
17, 1409-1463. CO.