Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Wenqi Zhang; Ke Tang; Hai Wu; Mengna Wang; Yongliang Shen; Guiyang Hou; Zeqi Tan; Peng Li; Yueting Zhuang; Weiming Lu

doi:10.18653/v1/2024.acl-long.292

Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu

Abstract

Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks. However, most LLM-based agents are designed as specific task solvers with sophisticated prompt engineering, rather than agents capable of learning and evolving through interactions. These task solvers necessitate manually crafted prompts to inform task rules and regulate LLM behaviors, inherently incapacitating to address complex dynamic scenarios e.g., large interactive games. In light of this, we propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization that can learn a wealth of expertise from interactive experiences and progressively elevate its behavioral policy. Specifically, it involves a dynamic belief generation and reflection process for policy evolution. Rather than action-level reflection, Agent-Pro iteratively reflects on past trajectories and beliefs, “fine-tuning” its irrational beliefs for a better policy. Moreover, a depth-first search is employed for policy optimization, ensuring continual enhancement in policy payoffs. Agent-Pro is evaluated across two games: Blackjack and Texas Hold’em, outperforming vanilla LLM and specialized models. Our results show Agent-Pro can learn and evolve in complex and dynamic scenes, which also benefits numerous LLM-based applications.

Anthology ID:: 2024.acl-long.292
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5348–5375
Language:
URL:: https://aclanthology.org/2024.acl-long.292
DOI:: 10.18653/v1/2024.acl-long.292
Bibkey:
Cite (ACL):: Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, and Weiming Lu. 2024. Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5348–5375, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization (Zhang et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.292.pdf

PDF Cite Search