Structured prediction by joint kernel support estimation

Lampert, Christoph H.; Blaschko, Matthew B.

doi:10.1007/s10994-009-5111-0

Structured prediction by joint kernel support estimation

Open access
Published: 07 April 2009

Volume 77, pages 249–269, (2009)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Structured prediction by joint kernel support estimation

Download PDF

Christoph H. Lampert¹ &
Matthew B. Blaschko^1,2

915 Accesses
20 Citations
Explore all metrics

Abstract

Discriminative techniques, such as conditional random fields (CRFs) or structure aware maximum-margin techniques (maximum margin Markov networks (M³N), structured output support vector machines (S-SVM)), are state-of-the-art in the prediction of structured data. However, to achieve good results these techniques require complete and reliable ground truth, which is not always available in realistic problems. Furthermore, training either CRFs or margin-based techniques is computationally costly, because the runtime of current training methods depends not only on the size of the training set but also on properties of the output space to which the training samples are assigned.

We propose an alternative model for structured output prediction, Joint Kernel Support Estimation (JKSE), which is rather generative in nature as it relies on estimating the joint probability density of samples and labels in the training set. This makes it tolerant against incomplete or incorrect labels and also opens the possibility of learning in situations where more than one output label can be considered correct.

At the same time, we avoid typical problems of generative models as we do not attempt to learn the full joint probability distribution, but we model only its support in a joint reproducing kernel Hilbert space. As a consequence, JKSE training is possible by an adaption of the classical one-class SVM procedure. The resulting optimization problem is convex and efficiently solvable even with tens of thousands of training examples. A particular advantage of JKSE is that the training speed depends only on the size of the training set, and not on the total size of the label space. No inference step during training is required (as M³N and S-SVM would) nor do we have calculate a partition function (as CRFs do).

Experiments on realistic data show that, for suitable kernel functions, our method works efficiently and robustly in situations that discriminative techniques have problems with or that are computationally infeasible for them.

Article PDF

Tractable Semi-supervised Learning of Complex Structured Prediction Models

Soft Margin Bayes-Point-Machine Classification via Adaptive Direction Sampling

A New Perspective of Support Vector Clustering with Boundary Patterns

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Altun, Y., Tsochantaridis, I., & Hofmann, T. (2003). Hidden Markov support vector machines. In ICML (pp. 3–10).
Bakır, G. H., Hofmann, T., Schölkopf, B., Smola, A. J., Taskar, B., & Vishwanathan, S. V. N. (2007). Predicting structured data. Cambridge: MIT Press.
Google Scholar
Baum, L. E., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. Annals of Mathematical Statistics, 41.
Bay, H., Tuytelaars, T., & Van Gool, L. J. (2006). SURF: speeded up robust features. In ECCV (pp. 404–417).
Benveniste, A., Métivier, M., & Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. New York: Springer.
MATH Google Scholar
Bertsekas, D. P. (1995). Nonlinear programming. Nashua: Athena Scientific.
MATH Google Scholar
Bilmes, J. A. (1999). Natural statistical models for automatic speech recognition. PhD thesis, University of California, Berkeley.
Blaschko, M. B., & Lampert, C. H. (2008). Learning to localize objects with structured output regression. In ECCV.
Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6.
Bordes, A., Usunier, N., & Bottou, L. (2008). Sequence labelling SVMs trained in one pass. In ECML/PKDD (pp. 146–161).
Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.). (2007). Large scale kernel machines. Cambridge: MIT Press.
Google Scholar
Collins, M. (2002). Discriminative training methods for hidden Markov models: theory and experiments with perceptron algorithms. In EMNLP.
Finley, T., & Joachims, T. (2008). Training structural SVMs when exact inference is intractable. In ICML.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6.
Har-Peled, S., Roth, D., & Zimak, D. (2007). Maximum margin coresets for active and noise tolerant learning. In IJCAI.
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, & Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear SVM. In ICML.
Jaakkola, T. S., & Haussler, D. (1998). Exploiting generative models in discriminative classifiers. In NIPS.
Joachims, T. (2006). Training linear SVMs in linear time. In ACM KDD.
Kassel, R. (1995). A comparison of approaches to on-line handwritten character recognition. PhD thesis, Massachusetts Institute of Technology, Cambridge.
Kindermann, R., & Snell, J. L. (1980). Markov random fields and their applications. Providence: American Mathematical Society.
Google Scholar
Kulesza, A., & Pereira, F. (2007). Structured learning with approximate inference. In NIPS.
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: probabilistic models for segmenting and labeling sequence data. In ICML.
Lafferty, J., Zhu, X., & Liu, Y. (2004). Kernel conditional random fields: representation and clique selection. In ICML.
Lampert, C. H., Blaschko, M. B., & Hofmann, T. (2008). Beyond sliding windows: object localization by efficient subwindow search. In CVPR.
Lin, C.-J., Weng, R. C., & Keerthi, S. S. (2008). Trust region Newton method for logistic regression. Journal of Machine Learning Research.
Martinus, D., & Tax, J. (2001). One-class classification: Concept-learning in the absence of counter-examples. PhD thesis, Delft University of Technology.
Mumford, D., & Shah, J. (1988). Optimal approximations by piecewise smooth functions and variational problems. Communications on Pure and Applied Mathematics, XLII(5).
Murphy, K. P., Weiss, Y., & Jordan, M. (1999). Loopy belief propagation for approximate inference: an empirical study. In UAI (pp. 467–475).
Ng, A. Y., & Jordan, M. I. (2003). On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In NIPS.
Parker, C., Fern, A., & Tadepalli, P. (2007). Learning for efficient retrieval of structured data with noisy queries. In ICML.
Pernkopf, F., & Bilmes, J. A. (2005). Discriminative versus generative parameter and structure learning of Bayesian network classifiers. In ICML.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77.
Sarawagi, S., & Gupta, R. (2008). Accurate max-margin training for structured output spaces. In ICML.
Schölkopf, B., & Smola, A. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471.
Article MATH Google Scholar
Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for SVM. In ICML.
Szdemak, S., Saunders, C., Shawe-Taylor, J., & Rousu, J. (2005). Learning hierarchies at two-class complexity. In NIPS workshop on kernel methods and structured domains.
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In NIPS.
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6.
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Vert, R., & Vert, J.-P. (2005). Consistency of one-class SVM and related algorithms. In NIPS.
Yang, X.-Y., Liu, J., Zhang, M.-Q., & Niu, K. (2007). A new multi-class SVM algorithm based on one-class SVM. In ICCS (pp. 677–684).

Download references

Author information

Authors and Affiliations

Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Christoph H. Lampert & Matthew B. Blaschko
Department of Engineering Science, University of Oxford, Oxford, UK
Matthew B. Blaschko

Authors

Christoph H. Lampert
View author publications
You can also search for this author in PubMed Google Scholar
Matthew B. Blaschko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christoph H. Lampert.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Lampert, C.H., Blaschko, M.B. Structured prediction by joint kernel support estimation. Mach Learn 77, 249–269 (2009). https://doi.org/10.1007/s10994-009-5111-0

Download citation

Received: 18 July 2008
Accepted: 20 March 2009
Published: 07 April 2009
Issue Date: December 2009
DOI: https://doi.org/10.1007/s10994-009-5111-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Structured prediction by joint kernel support estimation

Abstract

Article PDF

Similar content being viewed by others

Tractable Semi-supervised Learning of Complex Structured Prediction Models

Soft Margin Bayes-Point-Machine Classification via Adaptive Direction Sampling

A New Perspective of Support Vector Clustering with Boundary Patterns

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structured prediction by joint kernel support estimation

Abstract

Article PDF

Similar content being viewed by others

Tractable Semi-supervised Learning of Complex Structured Prediction Models

Soft Margin Bayes-Point-Machine Classification via Adaptive Direction Sampling

A New Perspective of Support Vector Clustering with Boundary Patterns

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation