Kernel Isomap: Heeyoul Choi and Seungjin Choi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Kernel Isomap

Heeyoul Choi and Seungjin Choi


1

Department of Computer Science POSTECH San 31 Hyoja-dong, Nam-gu Pohang 790-784, Korea {hychoi,seungjin}@postech.ac.kr January 15, 2005

Abstract Isomap [4] is a manifold learning algorithm, which extends classical multidimensional scaling (MDS) by considering approximate geodesic distance instead of Euclidean distance. The approximate geodesic distance matrix can be interpreted as a kernel matrix, which implies that Isomap can be solved by a kernel eigenvalue problem. However, the geodesic distance kernel matrix is not guaranteed to be positive semidenite. In this letter we employ a constant-adding method, which leads to the Mercer kernel-based Isomap algorithm. Numerical experimental results with noisy Swiss roll data, conrm the validity and high performance of our kernel Isomap algorithm.

Indexing terms: Embedding, Kernel PCA, Manifold learning, MDS, Nonlinear dimensionality reduction.

Appeared in

Electronics Letters
vol. 40, no. 25, pp. 1612-1613, December 2004.

1 Please address correspondence to Prof. Seungjin Choi, Department of Computer Science, POSTECH, San 31 Hyoja-dong, Nam-gu, Pohang 790-784, Korea, Tel: +82-54-279-2259, Fax: +82-54-279-2299, Email: [email protected]

Introduction

Manifold learning involves inducing a smooth nonlinear low-dimensional manifold from a set of data points drawn from the manifold. Isomap is a representative isometric mapping, which extends metric MDS by considering approximate geodesic distance, instead of Euclidean distance [4]. Relationships between classical scaling and PCA were well studied [1]. The projection of the input data onto the eigenvectors of the data sample covariance matrix, returns the classical solution. If the kernel function is isotropic, then kernel PCA can be interpreted as performing metric MDS [5]. The approximate geodesic distance matrix computed in Isomap, can be interpreted as a kernel matrix [2]. However, the kernel matrix based on the doubly centered approximate geodesic distance matrix, is not always positive denite. We exploit a constant-adding method such that the geodesic distance-based kernel matrix is guaranteed to be positive semidenite. Mercer kernel-based Isomap algorithm has a generalization property so that test data points can be successfully projected using a kernel trick as in kernel PCA [3], whereas in general embedding methods (including Isomap) do not have such a property.

The Kernel Isomap Algorithm

Given N objects with each object being represented by an m-dimensional vector xt , t = 1, . . . , N , the kernel Isomap algorithm nds a mapping which places N points in a low-dimensional space. Algorithm Outline: Kernel Isomap Step 1. Identify the k nearest neighbors of each input data point and construct a neighborhood graph where edge lengths between points in a neighborhood are set as their Euclidean distances. Step 2. Compute approximate geodesic distances, dij , containing shortest paths between all pairs of points and dene D 2 = d2 RN N . ij Step 3. Construct a matrix K(D 2 ) based on the approximate geodesic distance matrix D 2 : 1 K(D 2 ) = HD 2 H, 2 2 (1)

where H is the centering matrix, given by H = I Step 4. Compute the largest eigenvalue, c , of the matrix

1 T N ee ,

and e = [1, . . . , 1]T RN .

2 0 2K(D ) , I 4K(D)

(2)

and construct a Mercer kernel matrix K = K(D 2 ) by 1 K = K(D 2 ) + 2cK(D) + c2 H, 2 where K is guaranteed to be positive semidenite for c c . Step 5. Compute the top n eigenvectors of K, which leads to the eigenvector matrix V RN n and the eigenvalue matrix Rnn . Step 6. The coordinates of the N points in the n-dimensional Euclidean space are given by Y = 2 V T . A main dierence between the original Isomap and our kernel Isomap, lies in Step 4 which is related to the additive constant problem that was well studied in metric MDS. The additive constant problem aims at nding a value of constant, c, such that the dissimilarities dened by dij = dij + c(1 ij ) (4)
1

(3)

have a Euclidean representation for all c c and ij is the Kronecker delta. Substituting dij for dij in (4) gives Eq. (3). For K to be positive semidenite, it is required that xT Kx 0 for all x. Cailliez showed that c is given by the largest eigenvalue of the matrix (2) (see Sec. 2.2.8 in [1]). The matrix K is a Mercer kernel matrix, so its (i, j)-element is represented by Kij = k(xi , xj ) = T (xi )(xj ) where () is a nonlinear mapping onto a feature space or a lowdimensional manifold. The coordinates in the feature space can be easily computed by projecting the centered data matrix onto the normalized eigenvectors of the sample covariance matrix in the feature space, C =
1 N

(H) (H)T , where = [(x1 ), . . . , (xN )]. The kernel Isomap

algorithm provides a generalization property (projection property), which provides a mapping for a test data point. 3

Given a test data point x Rm , its low-dimensional image, y Rn is estimated by collecting projections onto the eigenvectors of C. As in kernel PCA, the kernel trick leads to
N

yj =
i=1

Vij k(xi , x),

(5)

where Vij is the ith-element of v j and yj is the jth-element of y.

Numerical Experiments

We compare our kernel Isomap algorithm to the original Isomap algorithm, using Swiss roll data that was also used in Isomap. Noisy Swiss roll data was generated by adding isotropic Gaussian noise with zero mean and 0.5 variance (see Fig. 1 (a)). In the training phase, 1200 data points were used and the neighborhood graph was constructed using k = 5 nearest neighbors of each data point. As in Isomap, the shortest paths were computed using the Dijkstras algorithm, in order to calculate approximate geodesic distances. An exemplary embedding result (onto 3-dimensional feature space) for Isomap and kernel Isomap, is shown in Fig. 1 (b) and (c). Due to the space limitation, we show only this result, but for the case of noise-free Swiss roll data and noisy semi-sphere data, our kernel Isomap algorithm outperformed Isomap. The generalization property of our kernel Isomap is shown in Fig. 1 (d) where 3000 test data points are embedded with preserving local isometry well.

Conclusion

We have presented the kernel Isomap algorithm where the approximate geodesic distance matrix was interpreted as a kernel matrix and an adding-constant method was exploited so that the geodesic distance-based kernel became Mercer kernel. Main advantages of the kernel Isomap could be summarized as follows: (a) generalization property (i.e., test data points can be projected onto the feature space using the kernel trick as in kernel PCA); (b) robustness for noisy data. The generalization property will derive the kernel Isomap to be useful for pattern recognition problems.

Acknowledgment

This work was supported by Korea Ministry of Science and Technology under Brain Science and Engineering Research Program and International Cooperative Research Program and by KOSEF 2000-2-20500-009-5.

References
[1] T. Cox and M. Cox. Multidimensional Scaling. Chapman & Hall, 2 edition, 2001. [2] J. Ham, D. D. Lee, S. Mika, and B. Schlkopf. A kernel view of the dimensionality reduction o of manifolds. In Proc. Intl Conf. Machine Learning, pages 369376, Ban, Canada, 2004. [3] B. Schlkopf, A. J. Smola, and K. -R. Mller. Nonlinear component analysis as a kernel o u eigenvalue problem. Neural Computation, 10(5):12991319, 1998. [4] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290:23192323, 2000. [5] C. K. I. Williams. On a connection between kernel PCA and metric multidimensional scaling. Machine Learning, 46:1119, 2002.

(a)

(b)

(c)

(d)

Figure 1: Comparison of the original Isomap with our kernel Isomap for the case of noisy Swiss Roll data: (a) noisy Swiss Roll data; (b) embedded points using the original Isomap; (c) embedded points using our kernel Isomap; (d) projection of test data points using the kernel Isomap. The modication by the constant-adding in the kernel Isomap improves the embedding with preserving local isometry (see (c)) as well as allowing to projecting test data points onto a feature space (see (d)).

You might also like