Convergence Rates for Empirical Estimation of Binary Classification Bounds
Abstract
:1. Introduction
1.1. Related Work
1.2. Organization
2. The Henze–Penrose Divergence Measure
2.1. The Multivariate Runs Test Statistic
2.2. Convergence Rates
2.3. Proof Sketch of Theorem 2
2.4. Concentration Bounds
3. Numerical Experiments
3.1. Simulation Study
3.2. Real Datasets
- Human Activity Recognition (HAR), Wearable Computing, Classification of Body Postures and Movements (PUC-Rio): This dataset contains five classes (sitting-down, standing-up, standing, walking, and sitting) collected on eight hours of activities of four healthy subjects.
- Skin Segmentation dataset (SKIN): The skin dataset is collected by randomly sampling B,G,R values from face images of various age groups (young, middle, and old), race groups (white, black, and asian), and genders obtained from the FERET and PAL databases [47].
- Sensorless Drive Diagnosis (ENGIN) dataset: In this dataset, features are extracted from electric current drive signals. The drive has intact and defective components. The dataset contains 11 different classes with different conditions. Each condition has been measured several times under 12 different operating conditions, e.g., different speeds, load moments, and load forces.
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
HP | Henze-Penrose |
BER | Bayes error rate |
MST | Minimal Spanning Tree |
FR | Friedman-Rafsky |
MSE | Mean squared error |
Appendix A. Proof of Theorem 4
Appendix B. Proof of Theorem 2
- (i)
- For constant which depends on d:
- (ii)
- (Subadditivity on and Superadditivity) Partition into subcubes such that , be the number of sample and respectively falling into the partition with dual . Then, we have
Appendix C. Proof of Theorems 3
Appendix D. Proof of Theorems 5–7
Concentration Bound (11) | |||||
---|---|---|---|---|---|
Optimal (11) | |||||
2 | 0.3439 | ||||
4 | 168,070 | 0.0895 | |||
5 | 550 | 0.9929 | |||
6 | 0.1637 | ||||
8 | 1200 | 0.7176 | |||
10 | 3500 | 0.4795 | |||
15 | 0.9042 |
Appendix E. Additional Proofs
References
- Xuan, G.; Chia, P.; Wu, M. Bhattacharyya distance feature selection. In Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, 25–29 August 1996; Volume 2, pp. 195–199. [Google Scholar]
- Hamza, A.; Krim, H. Image registration and segmentation by maximizing the Jensen-Renyi divergence. In Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 147–163. [Google Scholar]
- Hild, K.E.; Erdogmus, D.; Principe, J. Blind source separation using Renyi’s mutual information. IEEE Signal Process. Lett. 2001, 8, 174–176. [Google Scholar] [CrossRef]
- Basseville, M. Divergence measures for statistical data processing–An annotated bibliography. Signal Process. 2013, 93, 621–633. [Google Scholar] [CrossRef]
- Battacharyya, A. On a measure of divergence between two multinomial populations. Sankhy ā Indian J. Stat. 1946, 7, 401–406. [Google Scholar]
- Lin, J. Divergence Measures Based on the Shannon Entropy. IEEE Trans. Inf. Theory 1991, 37, 145–151. [Google Scholar] [CrossRef]
- Berisha, V.; Hero, A. Empirical non-parametric estimation of the Fisher information. IEEE Signal Process. Lett. 2015, 22, 988–992. [Google Scholar] [CrossRef]
- Berisha, V.; Wisler, A.; Hero, A.; Spanias, A. Empirically estimable classification bounds based on a nonparametric divergence measure. IEEE Trans. Signal Process. 2016, 64, 580–591. [Google Scholar] [CrossRef]
- Moon, K.; Hero, A. Multivariate f-divergence estimation with confidence. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 2420–2428. [Google Scholar]
- Moon, K.; Hero, A. Ensemble estimation of multivariate f-divergence. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 356–360. [Google Scholar]
- Moon, K.; Sricharan, K.; Greenewald, K.; Hero, A. Improving convergence of divergence functional ensemble estimators. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 1133–1137. [Google Scholar]
- Moon, K.; Sricharan, K.; Greenewald, K.; Hero, A. Nonparametric ensemble estimation of distributional functionals. arXiv 2016, arXiv:1601.06884v2. [Google Scholar]
- Noshad, M.; Moon, K.; Yasaei Sekeh, S.; Hero, A. Direct Estimation of Information Divergence Using Nearest Neighbor Ratios. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017. [Google Scholar]
- Yasaei Sekeh, S.; Oselio, B.; Hero, A. A Dimension-Independent discriminant between distributions. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018. [Google Scholar]
- Noshad, M.; Hero, A. Rate-optimal Meta Learning of Classification Error. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018. [Google Scholar]
- Wisler, A.; Berisha, V.; Wei, D.; Ramamurthy, K.; Spanias, A. Empirically-estimable multi-class classification bounds. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016. [Google Scholar]
- Yukich, J. Probability Theory of Classical Euclidean Optimization; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1998; Volume 1675. [Google Scholar]
- Steele, J. An Efron–Stein inequality for nonsymmetric statistics. Ann. Stat. 1986, 14, 753–758. [Google Scholar] [CrossRef]
- Aldous, D.; Steele, J.M. Asymptotic for Euclidean minimal spanning trees on random points. Probab. Theory Relat. Fields 1992, 92, 247–258. [Google Scholar] [CrossRef]
- Ma, B.; Hero, A.; Gorman, J.; Michel, O. Image registration with minimal spanning tree algorithm. In Proceedings of the IEEE International Conference on Image Processing, Vancouver, BC, Canada, 10–13 September 2000; pp. 481–484. [Google Scholar]
- Neemuchwala, H.; Hero, A.; Carson, P. Image registration using entropy measures and entropic graphs. Eur. J. Signal Process. 2005, 85, 277–296. [Google Scholar] [CrossRef]
- Hero, A.; Ma, B.; Michel , O.J.; Gorman, J. Applications of entropic spanning graphs. IEEE Signal Process. Mag. 2002, 19, 85–95. [Google Scholar] [CrossRef]
- Hero, A.; Michel, O. Estimation of Rényi information divergence via pruned minimal spanning trees. In Proceedings of the IEEE Workshop on Higher Order Statistics, Caesarea, Isreal, 16 June 1999. [Google Scholar]
- Smirnov, N. On the estimation of the discrepancy between empirical curves of distribution for two independent samples. Bull. Mosc. Univ. 1939, 2, 3–6. [Google Scholar]
- Wald, A.; Wolfowitz, J. On a test whether two samples are from the same population. Ann. Math. Stat. 1940, 11, 147–162. [Google Scholar] [CrossRef]
- Gibbons, J. Nonparametric Statistical Inference; McGraw-Hill: New York, NY, USA, 1971. [Google Scholar]
- Singh, S.; Póczos, B. Probability Theory and Combinatorial Optimization; CBMF-NSF Regional Conference in Applied Mathematics; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1997; Volume 69. [Google Scholar]
- Redmond, C.; Yukich, J. Limit theorems and rates of convergence for Euclidean functionals. Ann. Appl. Probab. 1994, 4, 1057–1073. [Google Scholar] [CrossRef]
- Redmond, C.; Yukich, J. Asymptotics for Euclidean functionals with power weighted edges. Stoch. Process. Their Appl. 1996, 6, 289–304. [Google Scholar] [CrossRef]
- Hero, A.; Costa, J.; Ma, B. Convergence Rates of Minimal Graphs with Random Vertices. Available online: https://pdfs.semanticscholar.org/7817/308a5065aa0dd44098319eb66f81d4fa7a14.pdf (accessed on 18 November 2019).
- Hero, A.; Costa, J.; Ma, B. Asymptotic Relations between Minimal Graphs and Alpha-Entropy; Tech. Rep.; Communication and Signal Processing Laboratory (CSPL), Department EECS, University of Michigan: Ann Arbor, MI, USA, 2003. [Google Scholar]
- Lorentz, G. Approximation of Functions; Holt, Rinehart and Winston: New York, NY, USA, 1996. [Google Scholar]
- Talagrand, M. Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de i’I. H. E. S. 1995, 81, 73–205. [Google Scholar] [CrossRef]
- Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 20 June–30 July 1961; pp. 547–561. [Google Scholar]
- Ali, S.; Silvey, S.D. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 28, 131–142. [Google Scholar] [CrossRef]
- Cha, S. Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Models Methods Appl. Sci. 2007, 1, 300–307. [Google Scholar]
- Rukhin, A. Optimal estimator for the mixture parameter by the method of moments and information affinity. In Proceedings of the 12th Prague Conference on Information Theory, Prague, Czech Republic, 29 August–2 September 1994; pp. 214–219. [Google Scholar]
- Toussaint, G. The relative neighborhood graph of a finite planar set. Pattern Recognit. 1980, 12, 261–268. [Google Scholar] [CrossRef]
- Zahn, C. Graph-theoretical methods for detecting and describing Gestalt clusters. IEEE Trans. Comput. 1971, 100, 68–86. [Google Scholar] [CrossRef]
- Banks, D.; Lavine, M.; Newton, H. The minimal spanning tree for nonparametric regression and structure discovery. In Computing Science and Statistics, Proceedings of the 24th Symposium on the Interface; Joseph Newton, H., Ed.; Interface Foundation of North America: Fairfax Station, FA, USA, 1992; pp. 370–374. [Google Scholar]
- Hoffman, R.; Jain, A. A test of randomness based on the minimal spanning tree. Pattern Recognit. Lett. 1983, 1, 175–180. [Google Scholar] [CrossRef]
- Efron, B.; Stein, C. The jackknife estimate of variance. Ann. Stat. 1981, 9, 586–596. [Google Scholar] [CrossRef]
- Singh, S.; Póczos, B. Generalized exponential concentration inequality for Rényi divergence estimation. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), Bejing, China, 22–24 June 2014; pp. 333–341. [Google Scholar]
- Singh, S.; Póczos, B. Exponential concentration of a density functional estimator. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; pp. 3032–3040. [Google Scholar]
- Lichman, M. UCI Machine Learning Repository. 2013. Available online: https://www.re3data.org/repository/r3d100010960 (accessed on 18 November 2019).
- Bhatt, R.B.; Sharma, G.; Dhall, A.; Chaudhury, S. Efficient skin region segmentation using low complexity fuzzy decision tree model. In Proceedings of the IEEE-INDICON, Ahmedabad, India, 16–18 December 2009; pp. 1–4. [Google Scholar]
- Steele, J.; Shepp, L.; Eddy, W. On the number of leaves of a euclidean minimal spanning tree. J. Appl. Prob. 1987, 24, 809–826. [Google Scholar] [CrossRef]
- Henze, N.; Penrose, M. On the multivarite runs test. Ann. Stat. 1999, 27, 290–298. [Google Scholar]
- Rhee, W. A matching problem and subadditive Euclidean funetionals. Ann. Appl. Prob. 1993, 3, 794–801. [Google Scholar] [CrossRef]
- Whittaker, E.; Watson, G. A Course in Modern Analysis, 4th ed.; Cambridge University Press: New York, NY, USA, 1996. [Google Scholar]
- Kingman, J. Poisson Processes; Oxford Univ. Press: Oxford, UK, 1993. [Google Scholar]
- Pál, D.; Póczos, B.; Szapesvári, C. Estimation of Renyi entropy andmutual information based on generalized nearest-neighbor graphs. In Proceedings of the 23th International Conference on Neural Information Processing Systems (NIPS 2010), Vancouver, BC, Canada, 6–9 December 2010. [Google Scholar]
FR Test Statistic | |||||
---|---|---|---|---|---|
Dataset | Variance-Like Interval | ||||
HAR | 3 | 0.995 | 600 | 600 | (2.994,3.006) |
SKIN | 4.2 | 0.993 | 600 | 600 | (4.196,4.204) |
ENGIN | 1.8 | 0.997 | 600 | 600 | (1.798,1.802) |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sekeh, S.Y.; Noshad, M.; Moon, K.R.; Hero, A.O. Convergence Rates for Empirical Estimation of Binary Classification Bounds. Entropy 2019, 21, 1144. https://doi.org/10.3390/e21121144
Sekeh SY, Noshad M, Moon KR, Hero AO. Convergence Rates for Empirical Estimation of Binary Classification Bounds. Entropy. 2019; 21(12):1144. https://doi.org/10.3390/e21121144
Chicago/Turabian StyleSekeh, Salimeh Yasaei, Morteza Noshad, Kevin R. Moon, and Alfred O. Hero. 2019. "Convergence Rates for Empirical Estimation of Binary Classification Bounds" Entropy 21, no. 12: 1144. https://doi.org/10.3390/e21121144
APA StyleSekeh, S. Y., Noshad, M., Moon, K. R., & Hero, A. O. (2019). Convergence Rates for Empirical Estimation of Binary Classification Bounds. Entropy, 21(12), 1144. https://doi.org/10.3390/e21121144