Computer Science ›› 2022, Vol. 49 ›› Issue (9): 172-182.doi: 10.11896/jsjkx.210800112
• Artificial Intelligence • Previous Articles Next Articles
XIONG Li-qin, CAO Lei, LAI Jun, CHEN Xi-liang
CLC Number:
[1]SUN Y,CAO L,CHEN X L,et al.Overview of multi-agent deep reinforcement learning[J].Computer Engineering and Application,2020,56(5):13-24. [2]SUTTON R S,BARTO A G.Introduction to reinforcementlearning[M].Cambridge:MIT press,1998. [3]HENDERSON P,ISLAM R,BACHMAN P,et al.Deep reinforcement learning that matters[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018,3207-3214. [4]LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444. [5]EGOROV M.Multi-agent deep reinforcement learning[R].Stanford University:EGOROV M,2016:1-8. [6]SUN C Y,MU Z X.Important Scientific Problems of Multi-Agent Deep Reinforcement Learning[J].Acta Automatica Sinica,2020,46(7):1301-1312. [7]NGUYEN T T,NGUYEN N D,NAHAVANDI S.Deep reinforcement learning for multiagent systems:A review of challenges,solutions,and applications[J].IEEE Transactions on Cybernetics,2020,50(9):3826-3839. [8]TAMPUU A,MATIISEN T,KODELJA D,et al.Multiagentcooperation and competition with deep reinforcement learning[J].PloS One,2017,12(4):e0172395. [9]FOERSTER J,ASSAEL I.A,DE FREITAS N,et al.Learning to communicate with deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2016:2137-2145. [10]GUPTA J K,EGOROV M,KOCHENDERFER M.Cooperative multi-agent control using deep reinforcement learning[C]//International Conference on Autonomous Agents and Multiagent Systems.Cham:Springer,2017:66-83. [11]LEIBO Z,ZAMBALDI V,LANCTOT M,et al.Multi-agent reinforcement learning in sequential social dilemmas[C]//Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems.International Foundation for Autonomous Agents and Multiagent Systems,2017:464-473. [12]ZHANG K Q,YANG Z R,BAAR T.Decentralized multi-agent reinforcement learning with networked agents:recent advances[J].Frontiers of Information Technology & Electronic Engineering,2021,22:802-814. [13]STANKOVIĆ M S,BEKO M,STANKOVIĆ S S.DistributedValue Function Approximation for Collaborative Multiagent Reinforcement Learning[J].IEEE Transactions on Control of Network Systems,2021,8(3):1270-1280. [14]SUNEHAG P,LEVER G,GRUSLYS A,et al.Value decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of AAMAS.2018:2085-2087. [15]RASHID T,SAMVELYAN M,SCHROEDER C,et al.Qmix:Monotonic value function factorisation for deep multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2018:4295-4304. [16]PAPOUDAKIS G,CHRISTIANOS F,SCHÄFER L,et al.Comparative evaluation of cooperative multi-agent deep reinforcement learning algorithms[J].arXiv:2006.07869,2020. [17]WANG J,REN Z,HAN B,et al.Towards Understanding Co-operative Multi-Agent Q-Learning with Value Factorization [C]//Advances in Neural Information Processing Systems.2021:29142-29155. [18]WANG J,REN Z,LIU T,et al.QPLEX:Duplex Dueling Multi-Agent Q-Learning[J].arXiv:2008.01062,2020. [19]SUTTON R S.Learning to predict by the methods of temporal differences[J].Machine Learning,1988,3(1):9-44. [20]WATKINS C,DAYAN P.Q-learning[J].Machine Learning,1992,8(3/4):279-292. [21]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533. [22]HASSELT H V,GUEZ A,SILVER D.Deep reinforcementlearning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2016,30(1):2094-2100. [23]LIPTON Z,LI X,GAO J,et al.BBQ-Networks:Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems[C]//Thirty-Second AAAI Conference on Artificial Intelligence.2018:5237-5244. [24]ANSCHEL O,BARAM N,SHIMKIN N.Averaged-DQN:Va-riance Reduction and Stabilization for Deep Reinforcement Learning[C]//Proceedings of the 34th International Conference on Machine Learning.2017:176-185. [25]WANG Z,SCHAUL T,HESSEL M,et al.Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning.PMLR,2016:1995-2003. [26]HAUSKNECHT M,STONE P.Deep recurrent q-learning forpartially observable MDPs[C]//2015 AAAI Fall Symposium Series.2015:29-37. [27]NAIR A,SRINIVASAN P,BLACKWELL S,et al.Massivelyparallel methods for deep reinforcement learning[J].arXiv:1507.04296,2015. [28]SOROKIN I,SELEZNEV A,PAVLOV M,et al.Deep attention recurrent Q-network[J].arXiv:1512.01693,2015. [29]OLIEHOEK F A,SPAAN M T J,VLASSIS N.Optimal and approximate Q-value functions for decentralized POMDPs[J].Journal of Artificial Intelligence Research,2008,32:289-353. [30]OROOJLOOYJADID A,HAJINEZHAD D.A review of coope-rative multi-agent deep reinforcement learning[J].arXiv:1908.03963,2019. [31]WANG Q L,PSILLAKIS H E,SUN C Y.Cooperative control of multiple agents with unknown high-frequency gain signs under unbalanced and switching topologies[J].IEEE Transactions on Automatic Control,2019,64(6):2495-2501. [32]BU ŞONIU L,BABU?KA R,DE SCHUTTER B.Multi-agentreinforcement learning:An overview[C]//IEEE Transactions on Systems,Man,and Cybernetics—Part C:Applications and Reviews,2008,38(2):156-172. [33]FOERSTER J,FARQUHAR G,AFOURAS T,et al.Counterfactual multi-agent policy gradients[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2018:2974-2982. [34]RASHID T,FARQUHAR G,PENG B,et al.Weighted QMIX:Expanding monotonic value function factorisation for deep multi-agent reinforcement learning[C]//Advances in Neural Information Processing Systems.2020:10199-10210. [35]SHAO K,ZHU Y,TANG Z,et al.Cooperative Multi-AgentDeep Reinforcement Learning with Counterfactual Reward[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8. [36]SON K,KIM D,KANG W J,et al.Qtran:Learning to factorize with transformation for cooperative multi-agent reinforcement learning[C]//International Conference on Machine Learning.PMLR,2019:5887-5896. [37]SON K,AHN S,REYES R D,et al.QTRAN++:Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2006.12010,2020. [38]SUN W F,LEE C K,LEE C Y.A Distributional Perspective on Value Function Factorization Methods for Multi-Agent Reinforcement Learning[C]//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems.2021:1671-1673. [39]YANG Y,HAO J,LIAO B,et al.Qatten:A general framework for cooperative multiagent reinforcement learning[J].arXiv:2002.03939,2020. [40]IQBAL S,DE WITT C A S,PENG B,et al.Randomized Entity-wise Factorization for Multi Agent Reinforcement Learning[C]//International Conference on Machine Learning.PMLR,2021:4596-4606. [41]ZHANG Y,MA H,WANG Y.AVD-Net:Attention Value Decomposition Network For Deep Multi-Agent Reinforcement Learning[C]//2020 25th International Conference on Pattern Recognition(ICPR).IEEE,2021:7810-7816. [42]WU B,YANG X,SUN C,et al.Learning Effective Value Function Factorization via Attentional Communication[C]//2020 IEEE International Conference on Systems,Man,and Cyberne-tics(SMC).IEEE,2020:629-634. [43]LIU X,TAN Y.Attentive relational state representation in decentralized multiagent reinforcement learning[J].IEEE Transa-ctions on Cybernetics,2020,52(1):252-264. [44]VASWANI A,SHAZEER N,PARMAR N,et al.Attention isall you need[C]//Advances in Neural Information Processing Systems.2017:5998-6008. [45]SCHROEDER DE WITT C,FOERSTER J,FARQUHAR G,et al.Multi-agent common knowledge reinforcement learning[J].Advances in Neural Information Processing Systems,2019,32:9927-9939. [46]ZHENG J,CHEN J,ZHU K.Unmanned Swarm Cooperative Design Based on Multi-agent Reinforcement Learning[J].Command Information System and Technology,2020,11(6):6. [47]CHU T,WANG J,CODECÀ L,et al.Multi-agent deep rein-forcement learning for large-scale traffic signal control[J].IEEE Transactions on Intelligent Transportation Systems,2019,21(3):1086-1095. [48]ZHU F,YANG Z,LIN F,et al.Decentralized cooperative control of multiple energy storage systems in urban railway based on multiagent deep reinforcement learning[J].IEEE Transactions on Power Electronics,2020,35(9):9368-9379. [49]WANG Y,ZHENG K,TIAN D,et al.Cooperative channel assignment for VANETs based on multiagent reinforcementlear-ning[J].Frontiers of Information Technology & Electronic Engineering,2020,21(7):1047-1058. [50]ZHANG P,TIAN H,ZHAO P T,et al.Computation offloading strategy in multi-agent cooperation scenario based on reinforcement learning with value-decomposition[J].Journal on Communications,2021,42(6):1-15. [51]XU S,GUO C,HU R Q,et al.Value Decomposition basedMulti-Task Multi-Agent Deep Reinforcement Learning in Vehicular Networks[C]//GLOBECOM 2020-2020 IEEE Global Communications Conference.IEEE,2020:1-6. [52]ZHANG L X,GUO Y,LI N,et al.Path planning method of autonomous vehicles based on multi agent reinforcement learning[J].Audio Engineering,2021,45(3):52-57. [53]SU J,ADAMS S,BELING P.Value-Decomposition Multi-Agent Actor-Critics[C]//Proceedings of the AAAI Conference on Artificial Intelligence.2021:11352-11360. [54]FANG F B,MA Y T,WANG Z J,et al.Emotion-Based Heterogeneous Multi-agent Reinforcement Learning with Sparse Reward[J].Pattern Recognition and Artificial Intelligence,2021,34(3):223-231. [55]PU Y,WANG S,YANG R,et al.Decomposed Soft Actor-Critic Method for Cooperative Multi-Agent Reinforcement Learning[J].arXiv:2104.06655,2021. [56]SHEIKH H U,BÖLÖNI L.Multi-agent reinforcement learning for problems with combined individual and team reward[C]//2020 International Joint Conference on Neural Networks(IJCNN).IEEE,2020:1-8. |
[1] | ZHOU Fang-quan, CHENG Wei-qing. Sequence Recommendation Based on Global Enhanced Graph Neural Network [J]. Computer Science, 2022, 49(9): 55-63. |
[2] | DAI Yu, XU Lin-feng. Cross-image Text Reading Method Based on Text Line Matching [J]. Computer Science, 2022, 49(9): 139-145. |
[3] | ZHOU Le-yuan, ZHANG Jian-hua, YUAN Tian-tian, CHEN Sheng-yong. Sequence-to-Sequence Chinese Continuous Sign Language Recognition and Translation with Multi- layer Attention Mechanism Fusion [J]. Computer Science, 2022, 49(9): 155-161. |
[4] | RAO Zhi-shuang, JIA Zhen, ZHANG Fan, LI Tian-rui. Key-Value Relational Memory Networks for Question Answering over Knowledge Graph [J]. Computer Science, 2022, 49(9): 202-207. |
[5] | JIANG Meng-han, LI Shao-mei, ZHENG Hong-hao, ZHANG Jian-peng. Rumor Detection Model Based on Improved Position Embedding [J]. Computer Science, 2022, 49(8): 330-335. |
[6] | ZHU Cheng-zhang, HUANG Jia-er, XIAO Ya-long, WANG Han, ZOU Bei-ji. Deep Hash Retrieval Algorithm for Medical Images Based on Attention Mechanism [J]. Computer Science, 2022, 49(8): 113-119. |
[7] | SUN Qi, JI Gen-lin, ZHANG Jie. Non-local Attention Based Generative Adversarial Network for Video Abnormal Event Detection [J]. Computer Science, 2022, 49(8): 172-177. |
[8] | YAN Jia-dan, JIA Cai-yan. Text Classification Method Based on Information Fusion of Dual-graph Neural Network [J]. Computer Science, 2022, 49(8): 230-236. |
[9] | WANG Ming, PENG Jian, HUANG Fei-hu. Multi-time Scale Spatial-Temporal Graph Neural Network for Traffic Flow Prediction [J]. Computer Science, 2022, 49(8): 40-48. |
[10] | JIN Fang-yan, WANG Xiu-li. Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM [J]. Computer Science, 2022, 49(7): 179-186. |
[11] | XIONG Luo-geng, ZHENG Shang, ZOU Hai-tao, YU Hua-long, GAO Shang. Software Self-admitted Technical Debt Identification with Bidirectional Gate Recurrent Unit and Attention Mechanism [J]. Computer Science, 2022, 49(7): 212-219. |
[12] | PENG Shuang, WU Jiang-jiang, CHEN Hao, DU Chun, LI Jun. Satellite Onboard Observation Task Planning Based on Attention Neural Network [J]. Computer Science, 2022, 49(7): 242-247. |
[13] | ZHANG Ying-tao, ZHANG Jie, ZHANG Rui, ZHANG Wen-qiang. Photorealistic Style Transfer Guided by Global Information [J]. Computer Science, 2022, 49(7): 100-105. |
[14] | ZENG Zhi-xian, CAO Jian-jun, WENG Nian-feng, JIANG Guo-quan, XU Bin. Fine-grained Semantic Association Video-Text Cross-modal Entity Resolution Based on Attention Mechanism [J]. Computer Science, 2022, 49(7): 106-112. |
[15] | XU Ming-ke, ZHANG Fan. Head Fusion:A Method to Improve Accuracy and Robustness of Speech Emotion Recognition [J]. Computer Science, 2022, 49(7): 132-141. |
|