Computer Science and Information Systems 2012 Volume 9, Issue 2, Pages: 691-712
https://doi.org/10.2298/CSIS111211014T
Full text ( 825 KB)
Cited by
Nearest neighbor voting in high dimensional data: Learning from past occurrences
Tomašev Nenad (Artificial Intelligence Laboratory, Jožef Stefan Institute and Jožef Stefan International Postgraduate School, Ljubljana, Slovenia)
Mladenić Dunja (Artificial Intelligence Laboratory, Jožef Stefan Institute and Jožef Stefan International Postgraduate School, Ljubljana, Slovenia)
Hubness is a recently described aspect of the curse of dimensionality
inherent to nearest-neighbor methods. This paper describes a new approach for
exploiting the hubness phenomenon in k-nearest neighbor classification. We
argue that some of the neighbor occurrences carry more information than
others, by the virtue of being less frequent events. This observation is
related to the hubness phenomenon and we explore how it affects
high-dimensional k-nearest neighbor classification. We propose a new
algorithm, Hubness Information k-Nearest Neighbor (HIKNN), which introduces
the k-occurrence informativeness into the hubness-aware k-nearest neighbor
voting framework. The algorithm successfully overcomes some of the issues
with the previous hubness-aware approaches, which is shown by performing an
extensive evaluation on several types of high-dimensional data.