Abstract
This article proposes an active basis model, a shared sketch algorithm, and a computational architecture of sum-max maps for representing, learning, and recognizing deformable templates. In our generative model, a deformable template is in the form of an active basis, which consists of a small number of Gabor wavelet elements at selected locations and orientations. These elements are allowed to slightly perturb their locations and orientations before they are linearly combined to generate the observed image. The active basis model, in particular, the locations and the orientations of the basis elements, can be learned from training images by the shared sketch algorithm. The algorithm selects the elements of the active basis sequentially from a dictionary of Gabor wavelets. When an element is selected at each step, the element is shared by all the training images, and the element is perturbed to encode or sketch a nearby edge segment in each training image. The recognition of the deformable template from an image can be accomplished by a computational architecture that alternates the sum maps and the max maps. The computation of the max maps deforms the active basis to match the image data, and the computation of the sum maps scores the template matching by the log-likelihood of the deformed active basis.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Amit, Y., & Trouve, A. (2007). Pop: Patchwork of parts models for object recognition. International Journal of Computer Vision, 75, 267–282.
Borenstein, E., & Ullman, S. (2002). Class-specific, top-down segmentation. In Proceedings of European conference on computer vision.
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 681–685.
Daugman, J. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Journal of Optical Society of America, 2, 1160–1169.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1–38.
Ferrari, V., Jurie, F., & Schmid, C. (2007). Accurate object detection with deformable shape models learnt from images. In Proceedings of IEEE conference on computer vision and pattern recognition.
Ferryman, J. M. (2006). In Proceedings of ninth IEEE international workshop on performance evaluation of tracking and surveillance (PETS 2006).
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.
Friedman, J. H. (1987). Exploratory projection pursuit. Journal of the American Statistical Association, 82, 249–266.
Geman, S., Potter, D. F., & Chi, Z. (2002). Composition systems. Quarterly of Applied Mathematics, 60, 707–736.
Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: active contour models. International Journal of Computer Vision, 1, 321–331.
Lades, M., Vorbrggen, J. C., Buhmann, J., Lange, J., von der Malsburg, C., Wrtz, R. P., & Konen, W. (1993). Distortion invariant object recognition in the dynamic link architecture. IEEE Transactions on Computers, 42, 300–311.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278–2324.
Mallat, S., & Zhang, Z. (1993). Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal Processing, 41, 3397–3415.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.
Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 380–393.
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.
Tu, Z. (2007). Learning generative models via discriminative approaches. In Proceedings of IEEE conference on computer vision and pattern recognition.
Ullman, S. (1996). High-level vision: object recognition and visual cognition. Cambridge: MIT Press.
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Proceedings of IEEE conference on computer vision and pattern recognition.
Wu, Y. N., Shi, Z., Fleming, C., & Zhu, S. C. (2007). Deformable template as active basis. In Proceedings of international conference on computer vision.
Wu, Y. N., Guo, C., & Zhu, S. C. (2008). From information scaling of natural images to regimes of statistical models. Quarterly of Applied Mathematics, 66, 81–122.
Yuille, A. L., Hallinan, P. W., & Cohen, D. S. (1992). Feature extraction from faces using deformable templates. International Journal of Computer Vision, 8, 99–111.
Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Proceedings of European conference on computer vision.
Zhu, S. C., & Mumford, D. B. (1997). Prior learning and Gibbs reaction-diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19, 1236–1250.
Zhu, S. C., & Mumford, D. B. (2006). A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2, 259–362.
Zhu, S. C., Wu, Y. N., & Mumford, D. B. (1997). Minimax entropy principle and its applications to texture modeling. Neural Computation, 9, 1627–1660.
Zhu, S. C., Guo, C. E., Wang, Y. Z., & Xu, Z. J. (2005). What are textons? International Journal of Computer Vision, 62, 121–143.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Wu, Y.N., Si, Z., Gong, H. et al. Learning Active Basis Model for Object Detection and Recognition. Int J Comput Vis 90, 198–235 (2010). https://doi.org/10.1007/s11263-009-0287-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-009-0287-0