Sequential and Dynamic constraint Contrastive Learning for Reinforcement Learning
W Shen, L Yuan, J Huang, S Gao… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
W Shen, L Yuan, J Huang, S Gao, Y Huang, Y Yu
2021 International Joint Conference on Neural Networks (IJCNN), 2021•ieeexplore.ieee.orgContrastive unsupervised learning gives remarkable promise for sample-efficiency
improvement in reinforcement learning, especially for high-dimensional observations by
extracting latent features from raw inputs. However, prior works scarcely take sequential
information and the knowledge of dynamic transitions into consideration when constructing
contrastive samples. In this paper, we propose Sequential and Dynamic constraint
Contrastive Reinforcement Learning (SDCRL) to improve the sample efficiency in high …
improvement in reinforcement learning, especially for high-dimensional observations by
extracting latent features from raw inputs. However, prior works scarcely take sequential
information and the knowledge of dynamic transitions into consideration when constructing
contrastive samples. In this paper, we propose Sequential and Dynamic constraint
Contrastive Reinforcement Learning (SDCRL) to improve the sample efficiency in high …
Contrastive unsupervised learning gives remarkable promise for sample-efficiency improvement in reinforcement learning, especially for high-dimensional observations by extracting latent features from raw inputs. However, prior works scarcely take sequential information and the knowledge of dynamic transitions into consideration when constructing contrastive samples. In this paper, we propose Sequential and Dynamic constraint Contrastive Reinforcement Learning (SDCRL) to improve the sample efficiency in high-dimensional inputs (e.g., images) setting. We firstly construct a sequential contrastive module to extract latent features with sequential information from raw correlated image inputs. Furthermore, we add a dynamic transition classification module to extract the knowledge of state transitions. We validate the proposed method in low sample regime (few interactions). Our algorithm surpasses prior pixel-based approaches on complex tasks in Deepmind Control Suite and even achieves or exceeds the performance of the method that uses state-based features as inputs on 11 out of 15 tasks. In Atari2600 games, SDCRL also outperforms strong baselines and achieves state-of-the-art performance on 7 out of 26 games.
ieeexplore.ieee.org
Showing the best result for this search. See all results