gBoost: a mathematical programming approach to graph classification and regression

Saigo, Hiroto; Nowozin, Sebastian; Kadowaki, Tadashi; Kudo, Taku; Tsuda, Koji

doi:10.1007/s10994-008-5089-z

gBoost: a mathematical programming approach to graph classification and regression

Open access
Published: 12 November 2008

Volume 75, pages 69–89, (2009)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

gBoost: a mathematical programming approach to graph classification and regression

Download PDF

Hiroto Saigo¹^nAff2,
Sebastian Nowozin¹,
Tadashi Kadowaki³^nAff4,
Taku Kudo⁵ &
…
Koji Tsuda¹

2606 Accesses
3 Altmetric
Explore all metrics

Abstract

Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

Article PDF

Boosting for graph classification with universum

Article 19 March 2016

A hybrid method for learning multi-dimensional Bayesian network classifiers based on an optimization model

Article 28 July 2015

Graph Matching Networks Meet Optimum-Path Forest: How to Prune Ensembles Efficiently

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Abiteboul, S., Buneman, P., & Suciu, D. (2000). Data on the web: from relations to semistructured data and XML. San Mateo: Morgan Kaufmann.
Google Scholar
Borgwardt, K. M., Ong, C. S., Schönauer, S., Vishwanathan, S. V. N., Smola, A. J., & Kriegel, H.-P. (2006). Protein function prediction via graph kernels. Bioinformatics, 21(suppl. 1), i47–i56.
Google Scholar
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
MATH Google Scholar
Bringmann, B., Zimmermann, A., Raedt, L. D., & Nijssen, S. (2006). Don’t be afraid of simpler patterns. In 10th European conference on principles and practice of knowledge discovery in databases (PKDD) (pp. 55–66).
Cai, L., & Hofmann, T. (2004). Hierarchical document categorization with support vector machines. In ACM 13th conference on information and knowledge management (pp. 78–87). New York: ACM Press.
Google Scholar
Cohen, W. W. (1995). Fast effective rule induction. In Proceedings of the 12th international conference on machine learning (pp. 115–123). San Mateo: Morgan Kaufmann.
Google Scholar
Demiriz, A., Bennet, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1–3), 225–254.
Article MATH Google Scholar
du Merle, O., Villeneuve, D., Desrosiers, J., & Hansen, P. (1999). Stabilized column generation. Discrete Mathematics, 194, 229–237.
Article MathSciNet MATH Google Scholar
Duran, J. L., Leland, B. A., Henry, D. R., & Nourse, J. G. (2002). Reoptimization of MDL keys for use in drug discovery. Journal of Chemical Information and Computer Sciences, 42(6), 1273–1280.
Google Scholar
Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press.
MATH Google Scholar
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In Proceedings of the 15th international conference on machine learning (pp. 114–151). San Mateo: Morgan Kaufmann.
Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MathSciNet MATH Google Scholar
Fröhrich, H., Wegner, J., Sieker, F., & Zell, Z. (2006). Kernel functions for attributed molecular graphs—a new similarity based approach to ADME prediction in classification and regression. QSAR & Combinatorial Science, 25(4), 317–326.
Article Google Scholar
Gärtner, T., Flach, P., & Wrobel, S. (2003). On graph kernels: Hardness results and efficient alternatives. In Proceedings of the 16th annual conference on computational learning theory and 7th kernel workshop (pp. 129–143). Berlin: Springer.
Google Scholar
Gasteiger, J., & Engel, T. (2003). Chemoinformatics: a textbook. New York: Wiley-VCH.
Book Google Scholar
Hamada, M., Tsuda, K., Kudo, T., Kin, T., & Asai, K. (2006). Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics, 22, 2480–2487.
Article Google Scholar
Helma, C., Cramer, T., Kramer, S., & Raedt, L. D. (2004). Data mining and machine learning techniques for the identification of mutagenicity inducing substructures and structure activity relationships of noncongeneric compounds. Journal of Chemical Information and Computer Sciences, 44, 1402–1411.
Google Scholar
Hong, H., Fang, H., Xie, Q., Perkins, R., Sheehan, D. M., & Tong, W. (2003). Comparative molecular field analysis (CoMFA) model using a large diverse set of natural, synthetic and environmental chemicals for binding to the androgen receptor. SAR and QSAR in Environmental Research, 14(5–6), 373–388.
Article Google Scholar
Horváth, T., Gärtner, T., & Wrobel, S. (2004). Cyclic pattern kernels for predictive graph mining. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 158–167). New York: ACM Press.
Chapter Google Scholar
Inokuchi, A. (2005). Mining generalized substructures from a set of labeled graphs. In Proceedings of the 4th IEEE international conference on data mining (pp. 415–418). Los Alamitos: IEEE Computer Society.
Google Scholar
James, C. A., Weininger, D., & Delany, J. (2004). Daylight theory manual.
Kashima, H., Tsuda, K., & Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proceedings of the 21st international conference on machine learning (pp. 321–328). Menlo Park: AAAI Press.
Google Scholar
Kazius, J., Nijssen, S., Kok, J., Bäck, T., & Ijzerman, A. P. (2006). Substructure mining using elaborate chemical representation. Journal of Chemical Information and Modeling, 46, 597–605.
Article Google Scholar
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 1–2, 273–324.
Article Google Scholar
Kudo, T., Maeda, E., & Matsumoto, Y. (2005). An application of boosting to graph classification. In Advances in neural information processing systems 17 (pp. 729–736). Cambridge: MIT Press.
Google Scholar
Le, Q. V., Smola, A. J., & Gärtner, T. (2006). Simpler knowledge-based support vector machines. In Proceedings of the 23rd international conference on machine learning (pp. 521–528). New York: ACM Press.
Chapter Google Scholar
Luenberger, D. G. (1969). Optimization by vector space methods. New York: Wiley.
MATH Google Scholar
Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., & Vert, J.-P. (2005). Graph kernels for molecular structure—activity relationship analysis with support vector machines. Journal of Chemical Information and Modeling, 45, 939–951.
Article Google Scholar
Mahé, P., Ralaivola, L., Stoven, V., & Vert, J.-P. (2006). The pharmacophore kernel for virtual screening with support vector machines. Journal of Chemical Information and Modeling, 46(5), 2003–2014.
Article Google Scholar
Morishita, S. (2001). Computing optimal hypotheses efficiently for boosting. In Discovery science (pp. 471–481).
Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of ACM SIGACT-SIGMOD-SIGART symposium on database systems (PODS) (pp. 226–236).
Nijssen, S., & Kok, J. N. (2004). A quickstart in frequent structure mining can make a difference. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 647–652). New York: ACM Press.
Chapter Google Scholar
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Mateo: Morgan Kaufmann.
Google Scholar
Ralaivola, L., Swamidass, S. J., Saigo, H., & Baldi, P. (2005). Graph kernels for chemical informatics. Neural Networks, 18(8), 1093–1110.
Article Google Scholar
Rätsch, G., Mika, S., Schölkopf, B., & Müller, K.-R. (2002). Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), 1184–1199.
Article Google Scholar
Saigo, H., Kadowaki, T., & Tsuda, K. (2006). A linear programming approach for molecular QSAR analysis. In T. Gärtner, G.C. Garriga, & T. Meinl, (Eds.), International workshop on mining and learning with graphs (MLG) (pp. 85–96).
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.
Google Scholar
Shi, L. M., Fang, H., Tong, W., Wu, J., Perkins, R., & Blair, R. M. (2001). QSAR models using a large diverse set of estrogens. Journal of Chemical Information and Computer Sciences, 41, 186–195.
Google Scholar
Takabayashi, K., Nguyen, P. C., Ohara, K., Motoda, H., & Washio, T. (2006). Mining discriminative patterns from graph structured data with constrained search. In T. Gärtner, G.C. Garriga, & T. Meinl (Eds.), Proceedings of the international workshop on mining and learning with graphs (MLG) (pp. 205–212).
Tibshrani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.
MathSciNet Google Scholar
Wale, N., & Karypis, G. (2006). Comparison of descriptor spaces for chemical compound retrieval and classification. In Proceedings of the 2006 IEEE international conference on data mining (pp. 678–689).
Yan, X., & Han, J. (2002a). gSpan: graph-based substructure pattern mining. In Proceedings of the 2002 IEEE international conference on data mining (pp. 721–724). Los Alamitos: IEEE Computer Society.
Google Scholar
Yan, X., & Han, J. (2002b). gSpan: graph-based substructure pattern mining (Technical report). Department of Computer Science, University of Illinois at Urbana-Champaign.
Yuan, C., & Casasent, D. (2003). A novel support vector classifier with better rejection performance. In Proceedings of 2003 IEEE computer society conference on pattern recognition and computer vision (CVPR) (pp. 419–424).
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67(2), 301–320.
Article MathSciNet MATH Google Scholar

Download references

Author information

Hiroto Saigo
Present address: Max Planck Institute for Informatics, Campus E1 4, 66123, Saarbrücken, Germany
Tadashi Kadowaki
Present address: Eisai Co., Ltd. 5-1-3, Tokodai, Tsukuba, Ibaraki, 300-2638, Japan

Authors and Affiliations

Max Planck Institute for Biological Cybernetics, Spemannstrasse 38, 72076, Tübingen, Germany
Hiroto Saigo, Sebastian Nowozin & Koji Tsuda
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, 611-0011, Japan
Tadashi Kadowaki
Google Japan Inc., Cerulean Tower 6F, 26-1 Sakuragaoka-cho, Shibuya-ku, Tokyo, 150-8512, Japan
Taku Kudo

Authors

Hiroto Saigo
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Nowozin
View author publications
You can also search for this author in PubMed Google Scholar
Tadashi Kadowaki
View author publications
You can also search for this author in PubMed Google Scholar
Taku Kudo
View author publications
You can also search for this author in PubMed Google Scholar
Koji Tsuda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroto Saigo.

Additional information

Editors: Thomas Gärtner and Gemma C. Garriga.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Saigo, H., Nowozin, S., Kadowaki, T. et al. gBoost: a mathematical programming approach to graph classification and regression. Mach Learn 75, 69–89 (2009). https://doi.org/10.1007/s10994-008-5089-z

Download citation

Received: 09 March 2007
Revised: 02 October 2008
Accepted: 09 October 2008
Published: 12 November 2008
Issue Date: April 2009
DOI: https://doi.org/10.1007/s10994-008-5089-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

gBoost: a mathematical programming approach to graph classification and regression

Abstract

Article PDF

Similar content being viewed by others

Boosting for graph classification with universum

A hybrid method for learning multi-dimensional Bayesian network classifiers based on an optimization model

Graph Matching Networks Meet Optimum-Path Forest: How to Prune Ensembles Efficiently

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

gBoost: a mathematical programming approach to graph classification and regression

Abstract

Article PDF

Similar content being viewed by others

Boosting for graph classification with universum

A hybrid method for learning multi-dimensional Bayesian network classifiers based on an optimization model

Graph Matching Networks Meet Optimum-Path Forest: How to Prune Ensembles Efficiently

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation