Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann - Sparse Coding Neural Gas For Analysis of Nuclear Magnetic Resonance Spectros
Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann - Sparse Coding Neural Gas For Analysis of Nuclear Magnetic Resonance Spectros
Frank-Michael Schleif, Matthias Ongyerth and Thomas Villmann - Sparse Coding Neural Gas For Analysis of Nuclear Magnetic Resonance Spectros
j,k
j
+
k
(1)
with
k
R
D
being the reconstruction error
vector and
j,k
are the weighting coecients
with
j,k
[0, 1],
j
j,k
= 1 and
k
=
(
1,k
, . . . ,
M,k
). We dene a cost function
E
k
for f
k
as
E
k
= |
k
|
2
S
k
(2)
which has to be minimized. It contains a
regularization term S
k
. This term judges the
sparseness of the representation. It can de-
ned as
S
k
=
j
g (
j,k
) (3)
whereby g (x) is a non-linear function like
exp
_
x
2
_
, log
_
1
1+x
2
_
, etc.. Another choice
would be to take
S
k
= H (
k
) (4)
being the entropy of the vector
k
. We re-
mark that minimum sparseness is achieved i
j,k
= 1 for exactly one arbitrary j and zero
elsewhere. Using this minimum scenario, op-
timization is reduced to minimization of the
description errors |
k
|
2
or equivalently to
the optimization of the basis functions
j
.
The span for a set of data vectors, consists of
vectors
j
chosen as principal components.
Minimum principal component analysis re-
quires at least the determination of the rst
principal component. Taking into account
higher components improves the approxima-
tion. However, as mentioned above, if the
data space is non-linear, principal compo-
nent analysis (PCA) may be suboptimal. In
NMR spectroscopy, one possible way to over-
come this problem is to split the data space
into continuous patches, building homoge-
nous subsets on these patches and to cary
out a PCA on each subset taking only the
rst principal component. The respective
approach to determine the principal com-
ponent is a combination of adaptive PCA
(Oja-learning, [9]) and prototype-based vec-
tor quantization (neural gas [8]) called sparse
coding neural gas.
2.2.2 Sparse coding neural gas (SCNG)
We now briey describe the basic vari-
ant of SCNG according to [7]. In SCNG N
prototypes W =w
i
approximate the rst
principal component p
i
of the subsets
i
. A
functional data vector f
k
belongs to
i
i its
correlation to p
i
dened by the inner prod-
uct O(w
i
, f
k
) = w
i
, f
k
) is maximum:
i
=
_
f
k
[i = argmax
j
p
j
, f
k
)
_
(5)
The approximations w
i
can be obtained
adaptively by Oja-learning starting with ran-
dom vectors w
i
for time t = 0 with |w
i
| = 1.
Let P be the the probability density in
i
.
Then, for each time step t a data vector
f
k
i
is selected according to P and the
prototype w
i
is updated by
w
i
=
t
O(w
i
, f
k
) (f
k
O(w
i
, f
k
) w
i
) (6)
with
t
> 0,
t
t
0 ,
t
t
= and
2
t
< which is a converging stochastic
process [6]. The nal limit of the process is
w
i
= p
i
[9].
Yet, the subsets
i
are initially unknown
but requires the knowledge about their rst
principal components p
i
according to (5).
This problem is solved in analogy to the orig-
inal neural gas in vector quantization [8]. For
a randomly selected functional data vector f
k
(according P) for each prototype the corre-
lation O(w
i
, f
k
) is determined and the rank
r
i
is computed according to
r
i
(f
k
, W) = N
N
j=1
(O(w
i
, f
k
) O(w
j
, f
k
))
(7)
counting the number of pointers w
j
for
which the relation O(w
i
, f
k
) < O(w
j
, f
k
) is
valid [8]. (x) is the Heaviside-function. All
prototypes are updated according to
w
i
=
t
h
(v, W, i) O(w
i
, f
k
) (f
k
O(w
i
, f
k
) w
i
)
(8)
with
h
t
(f
k
, W, i) = exp
_
r
i
(f
k
, W)
t
_
(9)
is the so-called neighborhood function with
neighborhood range
t
> 0. Thus, the up-
date strength of each prototype is correlated
with its matching ability. Further, the tem-
porary data subset
i
(t) for a given proto-
type is
i
(t) =
_
f
k
[i = argmax
j
w
j
, f
k
)
_
(10)
For t the range is decreased as
t
0
and, hence, only the best matching proto-
type is updated in (8) in the limit. Then, in
the equilibrium of the stochastic process (8)
one has
i
(t)
i
for a certain subset con-
guration which is related to the data space
shape and the density P [16]. Further, one
gets w
i
= p
i
in the limit. Both results are
in complete analogy to usual neural gas, be-
cause the maximum over inner products is
mathematically equivalent to the minimum
of the Euclidean distance between the vec-
tors [5],[8].
2.2.3 Classication with Fuzzy Labeled
Self Organizing Map
The sparse coded spectra have been fed
into a special variant of a self organizing
map, called Fuzzy Labeled Self Organizing
Map (FL-SOM) as given in [12]. We do not
detail FL-SOM here but mention that it gen-
erates a classier and a topological mapping
of the data. The parameters of the FL-SOM
are: map size 5 10, nal neighborhood
range 0.75 and with remaining parameters
as in [12]. The map has been trained upto
convergence as specied in [12]. To obtain
the sparse coding on NMR data, the spec-
tra were splitted into 90 so called patches,
which are fragments of the NMR signal (see
[7]), with a width of 200 points, motivated by
the DSS width. For the SCNG algorithm 30
prototypes have been used, determined by a
grid search over dierent values. We would
like to mention that the number of proto-
types did not signicantly inuence the re-
sults but should be chosen in accordance to
A G Y S 12 13 14 23 24 34
A .9 .1
G .8
Y 1 .2
S 0.9 .2
12 1
13 .9
14 1
23 .8
24 .8
34 1
Table 1. Classication of metabolites using peak
lists. Simulated metabolites are almost perfectly re-
covered (A,G,Y,S) whereas for the unknown mix-
tures some miss identications can be observed.
the diversity of the substructures expected in
the overall dataset. The sparse coding got a
dimensionality reduction by a factor of 10.
3 Experiments and Results
Here we compare the peak picking encod-
ing and sparse coding for a set of simulated
metabolite spectra. We consider four types
of metabolites, relevant in metabolic studies
of the stem cell: Alanine (Ala), Glutamine
(Gln), Gycine (Gly) and Serine (Ser), sim-
ulated at 39 dierent linear increased con-
centration levels (1 39). Hence we obtain
156 spectra simulated using the prior men-
tioned NMR system parameters. Addition-
ally we generated mixtures of these metabo-
lites by combining two metabolites up to all
combinations, with 39 concentration levels.
This gives 6 39 mixture spectra, which are
not used in subsequently training steps but
used for external validations. All spectra are
processed as mentioned above and either en-
coded to peak lists or alternatively encoded
by sparse coding. The results for the peak
based approach are collected in Table 1 and
the sparse coding in 2 (Alanine - A, Gua-
nine G, Glycine - Y, Serine - S). Thereby
the peak lists of the patterns are directly
matched against the peak lists of the mea-
surement using a tolerance of 0.005 ppm.
For the peak lists we observe a very good
recognition as well as prediction accuracy.
In average the recognition (on the 4 train-
ing classes) is 91% and on the unknown 6
mixture classes 90%. It should be noted
that the fractions in a column of Table 1 do
not necessary accumulate to 1.0 = 100% be-
cause, the peak based identication is not
forced to identify one of the metabolites in
A G Y S 12 13 14 23 24 34
A .8
G 1 1 .4 1
Y 1
S .8
12
13 .2 1 .7
14 .3
23 .2 1
24 .6
34
Table 2. Classication of metabolites using sparse
coding. Pure metabolites are almost perfectly re-
covered (A,G,Y,S) whereas for the unknown mixture
data stronger miss identications are observed.
each analysis
1
.
The sparse coded data have been analyzed
using the FL-SOM and the obtained map
(which was topological preserving, topo-
graphic error < 0.05). The model has been
trained with 4 classes of metabolites. The
FL-SOM model generates a fuzzy-labeling
of the underlying prototypes and hence is
also able to give assignments to more than
one class. Using a majority vote scheme
to classify the data the training data have
been learned with 100% accuracy. But we
also wanted to determine the 6 new mixture
classes. To do this we dened prototypi-
cal labels for each class such as 1, 0, 0, 0
for class 1-Alanine and 0.5, 0.5, 0, 0 for a
mixture class of Alanine and Glycine. Spec-
tra where assigned to the closest prototype
and labeled by the label which was closest
to the labeling of the data point using Eu-
clidean distance. For example let a data
point v have a fuzzy label 1, 0, 0, 0 which
assigns it to class 1 or Alanine. Let further
be some prototype w be the winner (closest
prototype) for this data point with a fuzzy
label of 0.6, 0.4, 0, 0. Then two classi-
cations are possible. Using majority vote
the prototype label becomes 1, 0, 0, 0 and
hence the data point is assigned to class 1
- Alanine, which is correct. Using the al-
ternative scheme the fuzzy label 0.60.40, 0
is closer to 0.5, 0.5, 0, 0 then to 1, 0, 0, 0
and hence the prototype is labeled as a mix-
ture of alanine and glutamine, consequently
the data point is assigned to the 1/2 or Ala-
nine/Glutamine class leading to a (in this
case) wrong classication because the data
point was labeled as Alanine. Using this
scheme and considering the receptive elds
1
At least 75% of the peaks had to match to count
the classication
Figure 2. FL-SOM (bar plot) for the 4 metabolite
classes. The map is given as a 510 grid and for each
grid 4 bars are depicted indicating the fuzzy values
for the respective classes. As shown in the picture
(indicated by manually added closed regions - e.g.
ellipsoids in the corners) the map contains receptive
eld with high responsibility for a single class, but
there is also a region in the map responsible for data
points which are topologically located between dif-
ferent classes - in our case metabolite mixtures.
of the prototypes of the FL-SOM the pic-
ture is a bit dierent as shown in Table 2.
In average the recognition (on the 4 training
classes) becomes 87% and on the unknown 6
mixture classes we obtain 50%. However,
it should be noted that the used FL-SOM
classier did never see the mixture classes
during training but also learned prototypes
which are located between dierent classes
in a topology preserved manner. The er-
ror for the mixtures Ala/Gln and Gly/Ser
are caused due to the fact that no prototype
were learned on the map representing these
mixtures as depicted in Figure 2.
4 Conclusions
We presented a method for the sparse
coded representation of functional data ap-
plied in NMR spectroscopy and compared it
to an alternative peak based approach. All
approaches were able to recognize the plain
metabolite spectra at dierent concentration
levels. For the analysis of mixtures the peak
picking approach performed better but this
result is potentially biased because the simu-
lated data always show a perfect peak shape.
For the SCNG approach, we found promising
results, the metabolic information encoded
in the spectra could be preserved and a sig-
nicant data reduction by a factor of 10 was
achieved. The SCNG provided a sucient
and accurate data reduction such that the
FL-SOM classier method could be used in
a topology preserved manner. The SCNG
encoding also allows the application of other
data analysis methods, such as dierent clas-
siers or statistical tests, which need a com-
pact data representation. The SCNG gener-
ated a compact and discriminative encoding.
Future directions of improvement will focus
on a better combination of sparse coded data
and the FL-SOM, the additional integration
of NMR specic knowledge and an advanced
determination of the patches. In a next step
all methods will be analyzed on the basis of
real NMR metabolite and NMR cell extract
measurements.
2
.
References
[1] David Chang, Cory D. Banack, and Sirish L. Shah. Robust
baseline correction algorithm for signal dense nmr spectra.
Journal of Magnetic Resonance, 187(2):288292, 2007.
[2] Li Chen, Zhiqiang Weng an Laiyoong Goh, and Marc Gar-
land. An ecient algorithm for automatic phase correction
of nmr spectra based on entropy minimization. Journal of
Magnetic Resonance, 158(1-2):164168, 2002.
[3] M. Cross, R. Alt, and D. Niederwieser. The case for a
metabolic stem cell niche. Cells Tissues Organs, in press,
2008.
[4] S. W. Homans. A dictionary of concepts in NMR. Clarendon
Press, Oxford, 1992.
[5] Teuvo Kohonen. Self-Organizing Maps, volume 30 of Springer
Series in Information Sciences. Springer, Berlin, Heidelberg,
1995. (Second Extended Edition 1997).
[6] H.J. Kushner and D.S. Clark. Stochastic Appproximation
Methods for Constrained and Unconstrained Systems. Springer-
Verlag, New York, 1978.
[7] K. Labusch, E. Barth, and T. Martinetz. Learning data rep-
resentations with sparse coding neural gas. In M. Verleysen,
editor, Proceedings of the European Symposium on Articial
Neural Networks ESANN, page in press. d-side publications,
2008.
[8] Thomas M. Martinetz, Stanislav G. Berkovich, and Klaus J.
Schulten. Neural-gas network for vector quantization and
its application to time-series prediction. IEEE Trans. on Neu-
ral Networks, 4(4):558569, 1993.
[9] E. Oja. Neural networks, principle components and suspaces.
International Journal of Neural Systems, 1:6168, 1989.
[10] B. Olshausen and D. Field. Emergence of simple-cell recep-
tive eld properties by learning a sparse code for natural
images. Letters to Nature, 381:607609, 1996.
[11] T.H. Park. Towards Automatic Musical Instrument Timbre
Recognition. PhD thesis, Princeton University, 2004.
[12] F.-M. Schleif, T. Villmann, and B. Hammer. Prototype based
fuzzy classication in clinical proteomics. International Jour-
nal of Approximate Reasoning, 47(1):416, 2008.
[13] S.A. Smith, T.O. Levante, B.H. Meier, and R.R. Ernst. Com-
puter simulations in magnetic resonance. an object oriented
programming approach. J. Magn. Reson., 106a:75105, 1994.
[14] Lucksanaporn Tarachiwin, Koichi Ute, Akio Kobayashi, and
Eiichiro Fukusaki. 1h nmr based metabolic proling in the
evaluation of japanese green tea quality. J. Agric. Food
Chem., 55(23):93309336, 2007.
[15] M. Verleysen and D. Francois. The curse of dimensionality
in data mining and time series prediction. In J. Cabestany,
A. Prieto, and F. S. Hernandez, editors, Computational Intel-
ligence and Bioinspired Systems, Proceedings of the 8th Inter-
national Work-Conference on Articial Neural Networks 2005
(IWANN), Barcelona.
[16] T. Villmann and J.-C. Claussen. Magnication control in
self-organizing maps and neural gas. Neural Computation,
18(2):446469, 2006.
[17] B. Williams, S. Cornett, B. Dawant, A. Crecelius, B. Boden-
heimer, and R. Caprioli. An algorithm for baseline correc-
tion of maldi mass spectra. In Proceedings of the 43rd annual
Southeast regional conference - Volume 1, pages 137142, Ken-
nesaw, Georgia, 2005. ACM.
2
Acknowledgment: We are grateful to Thomas Riemer IZKF,
Leipzig University