Adiabatic persistent contrastive divergence learning

H Jang, H Choi, Y Yi, J Shin - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
2017 IEEE International Symposium on Information Theory (ISIT), 2017ieeexplore.ieee.org
This paper studies the problem of parameter learning in graphical models having latent
variables, where the standard approach is the expectation maximization algorithm
alternating expectation (E) and maximization (M) steps. However, both E and M steps are
computationally intractable for high dimensional data, while the substitution of one step to a
faster surrogate for combating against intractability can often cause failure in convergence.
To tackle the issue, the Contrastive Divergence (CD) learning scheme has been popularly …
This paper studies the problem of parameter learning in graphical models having latent variables, where the standard approach is the expectation maximization algorithm alternating expectation (E) and maximization (M) steps. However, both E and M steps are computationally intractable for high dimensional data, while the substitution of one step to a faster surrogate for combating against intractability can often cause failure in convergence. To tackle the issue, the Contrastive Divergence (CD) learning scheme has been popularly used in the deep learning community, where it runs the mean-field approximation in E step and a few cycles of Markov Chains (MC) in M step. In this paper, we analyze a variant of CD, called Adiabatic Persistent Contrastive Divergence (APCD), which runs a few cycles of MCs in both E and M steps. Using multi-time-scale stochastic approximation theory, we prove that APCD converges to a correct optimum, where the standard CD is impossible to have such a guarantee due to the mean-field approximation gap in E step. Despite of such stronger theoretical guarantee of APCD, its possible drawback is on slow mixing on E step for practical purposes. To address the issue, we also design a hybrid approach applying both mean-field and MC approximations in E step, where it outperforms the standard mean-field-based CD in our experiments on real-world datasets.
ieeexplore.ieee.org
Showing the best result for this search. See all results