As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
K-means is one of the most widely used clustering algorithms there is and applied for a wide range of settings, but how do we know that a data set is suited for it? K-means’s assumptions about the data are relatively strict: clusters should be Gaussian distributed with uniform variance in all directions. These assumptions are rarely satisfied in a data set. While clusters that do not deviate from these assumptions too far, can be cut out with sufficient precision, the farther the data is from these assumptions, the more likely k-means is to fail. Instead of testing whether the assumptions are met and k-means can be applied, we make it so. Our goal is to improve the suitability of data sets for k-means and widen the range of possible data sets it can be applied to. Our algorithm changes the position of data points so that the clusters become more compact and, thus, fit better into the requirements of k-means. Based on cluster-wise PCA and local Z-transformation we estimate the form of the correct clusters and move the data points so that the correct clusters become more compact with each iteration and – in the end – have uniform variance, as well as increase the distance between clusters. We explain the theory behind our approach and validate it with extensive experiments on various real world data sets.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.