Multi Agent Applications
Multi Agent Applications
Multi Agent Applications
with Evolutionary
Computation and
Biologically Inspired
Technologies:
Intelligent Techniques for
Ubiquity and Optimization
Shu-Heng Chen
National Chengchi University, Taiwan
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Hiroshi Sato
National Defense Academy, Japan
Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or com-
panies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Multi-agent applications with evolutionary computation and biologically inspired technologies : intelligent techniques for
ubiquity and optimization / Yasushi Kambayashi, editor.
p. cm.
Includes bibliographical references and index. Summary: "This book compiles numerous ongoing projects and research
efforts in the design of agents in light of recent development in neurocognitive science and quantum physics, providing readers
with interdisciplinary applications of multi-agents systems, ranging from economics to engineering"-- Provided by publisher.
ISBN 978-1-60566-898-7 (hardcover) -- ISBN 978-1-60566-899-4 (ebook) 1. Multiagent systems. 2. Evolutionary computation.
I. Kambayashi, Yasushi, 1958- QA76.76.I58M78 2010
006.3'2--dc22
2010011642
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
Table of Contents
Acknowledgment.............................................................................................................................xxviii
Section 1
Multi-Agent Financial Decision Systems
Chapter 1
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index................................... 1
Mak Kaboudan, University of Redlands, USA
Chapter 2
An Agent-Based Model for Portfolio Optimizations Using Search Space Splitting............................. 19
Yukiko Orito, Hiroshima University, Japan
Yasushi Kambayashi, Nippon Institute of Technology, Japan
Yasuhiro Tsujimura, Nippon Institute of Technology, Japan
Hisashi Yamamoto, Tokyo Metropolitan University, Japan
Section 2
Neuro-Inspired Agents
Chapter 3
Neuroeconomics: A Viewpoint from Agent-Based Computational Economics.................................... 35
Shu-Heng Chen, National Chengchi University, Taiwan
Shu G. Wang, National Chengchi University, Taiwan
Chapter 4
Agents in Quantum and Neural Uncertainty.......................................................................................... 50
Germano Resconi, Catholic University Brescia, Italy
Boris Kovalerchuk, Central Washington University, USA
Section 3
Bio-Inspired Agent-Based Artificial Markets
Chapter 5
Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-Based
Double Auction Markets........................................................................................................................ 78
Shu-Heng Chen, National Chengchi University, Taiwan
Ren-Jie Zeng, Taiwan Institute of Economic Research, Taiwan
Tina Yu, Memorial University of Newfoundland, Canada
Shu G. Wang, National Chengchi University, Taiwan
Chapter 6
Social Simulation with Both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior............................................................ 95
Shu-Heng Chen, National Chengchi University, Taiwan
Chung-Ching Tai, Tunghai University, Taiwan
Tzai-Der Wang, Cheng Shiu University, Taiwan
Shu G. Wang, National Chengchi University, Taiwan
Chapter 7
Evolution of Agents in a Simple Artificial Market.............................................................................. 118
Hiroshi Sato, National Defense Academy, Japan
Masao Kubo, National Defense Academy, Japan
Akira Namatame, National Defense Academy, Japan
Chapter 8
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets . .................. 134
Hiroshi Takahashi, Keio University, Japan
Takao Terano, Tokyo Institute of Technology, Japan
Section 4
Multi-Agent Robotics
Chapter 9
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks................... 156
Masanori Goka, Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura, Hiroshima University, Japan
Chapter 10
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering........................................ 174
Yasushi Kambayashi, Nippon Institute of Technology, Japan
Yasuhiro Tsujimura, Nippon Institute of Technology, Japan
Hidemi Yamachi, Nippon Institute of Technology, Japan
Munehiro Takimoto, Tokyo University of Science, Japan
Section 5
Multi-Agent Games and Simulations
Chapter 11
The AGILE Design of Reality Games AI............................................................................................ 193
Robert G. Reynolds, Wayne State University, USA
John O’Shea, University of Michigan-Ann Arbor, USA
Xiangdong Che, Wayne State University, USA
Yousof Gawasmeh, Wayne State University, USA
Guy Meadows, University of Michigan-Ann Arbor, USA
Farshad Fotouhi, Wayne State University, USA
Chapter 12
Management of Distributed Energy Resources Using Intelligent Multi-Agent System...................... 208
Thillainathan Logenthiran, National University of Singapore, Singapore
Dipti Srinivasan, National University of Singapore, Singapore
Section 6
Multi-Agent Learning
Chapter 13
Effects of Shaping a Reward on Multiagent Reinforcement Learning................................................ 232
Sachiyo Arai, Chiba University, Japan
Chapter 14
Swarm Intelligence Based Reputation Model for Open Multiagent Systems..................................... 248
Saba Mahmood, School of Electrical Engineering and Computer Science
(NUST-SEECS), Pakistan
Azzam ul Asar, Department of Electrical and Electronics Eng NWFP
University of Engineering and Technology, Pakistan
Hiroki Suguri, Miyagi University, Japan
Hafiz Farooq Ahmad, School of Electrical Engineering and Computer Science
(NUST-SEECS), Pakistan
Chapter 15
Exploitation-Oriented Learning XoL - A New Approach to Machine Learning
Based on Trial-and-Error Searches...................................................................................................... 267
Kazuteru Miyazaki, National Institution for Academic Degrees
and University Evaluation, Japan
Section 7
Miscellaneous
Chapter 16
Pheromone-Style Communication for Swarm Intelligence................................................................. 294
Hidenori Kawamura, Hokkaido University, Japan
Keiji Suzuki, Hokkaido University, Japan
Chapter 17
Evolutionary Search for Cellular Automata with Self-Organizing Properties
toward Controlling Decentralized Pervasive Systems......................................................................... 308
Yusuke Iwase, Nagoya University, Japan
Reiji Suzuki, Nagoya University, Japan
Takaya Arita, Nagoya University, Japan
Index.................................................................................................................................................... 349
Detailed Table of Contents
Acknowledgment.............................................................................................................................xxviii
Section 1
Multi-Agent Financial Decision Systems
Chapter 1
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index................................... 1
Mak Kaboudan, University of Redlands, USA
Successful decision-making by home-owners, lending institutions, and real estate developers among
others is dependent on obtaining reasonable forecasts of residential home prices. For decades, home-
price forecasts were produced by agents utilizing academically well-established statistical models. In
this chapter, several modeling agents will compete and cooperate to produce a single forecast. A cooper-
ative multi-agent system (MAS) is developed and used to obtain monthly forecasts (April 2008 through
March 2010) of the S&P/Case-Shiller home price index for Los Angeles, CA (LXXR). Monthly hous-
ing market demand and supply variables including conventional 30-year fixed real mortgage rate, real
personal income, cash out loans, homes for sale, change in housing inventory, and construction material
price index are used to find different independent models that explain percentage change in LXXR. An
agent then combines the forecasts obtained from the different models to obtain a final prediction.
Chapter 2
An Agent-Based Model for Portfolio Optimizations Using Search Space Splitting............................. 19
Yukiko Orito, Hiroshima University, Japan
Yasushi Kambayashi, Nippon Institute of Technology, Japan
Yasuhiro Tsujimura, Nippon Institute of Technology, Japan
Hisashi Yamamoto, Tokyo Metropolitan University, Japan
Portfolio optimization is the determination of the weights of assets to be included in a portfolio in order
to achieve the investment objective. It can be viewed as a tight combinatorial optimization problem that
has many solutions near the optimal solution in a narrow solution space. In order to solve such a tight
problem, this chapter introduces an Agent-based Model. The authors employ the Information Ratio, a
well-known measure of the performance of actively managed portfolios, as an objective function. This
agent has one portfolio, the Information Ratio and its character as a set of properties. The evolution of
agent properties splits the search space into a lot of small spaces. In a population of one small space,
there is one leader agent and several follower agents. As the processing of the populations progresses,
the agent properties change by the interaction between the leader and the follower, and when the itera-
tion is over, the authors obtain one leader who has the highest Information Ratio.
Section 2
Neuro-Inspired Agents
Chapter 3
Neuroeconomics: A Viewpoint from Agent-Based Computational Economics.................................... 35
Shu-Heng Chen, National Chengchi University, Taiwan
Shu G. Wang, National Chengchi University, Taiwan
Recently, the relation between neuroeconomics and agent-based computational economics (ACE) has
become an issue concerning the agent-based economics community. Neuroeconomics can interest
agent-based economists when they are inquiring for the foundation or the principle of the software-
agent design, normally known as agent engineering. It has been shown in many studies that the design
of software agents is non-trivial and can determine what will emerge from the bottom. Therefore, it
has been quested for rather a period regarding whether we can sensibly design these software agents,
including both the choice of software agent models, such as reinforcement learning, and the parameter
setting associated with the chosen model, such as risk attitude. This chapter starts a formal inquiry by
focusing on examining the models and parameters used to build software agents.
Chapter 4
Agents in Quantum and Neural Uncertainty.......................................................................................... 50
Germano Resconi, Catholic University Brescia, Italy
Boris Kovalerchuk, Central Washington University, USA
This chapter models quantum and neural uncertainty using a concept of the Agent–based Uncertainty
Theory (AUT). The AUT is based on complex fusion of crisp (non-fuzzy) conflicting judgments of
agents. It provides a uniform representation and an operational empirical interpretation for several un-
certainty theories such as rough set theory, fuzzy sets theory, evidence theory, and probability theory.
The AUT models conflicting evaluations that are fused in the same evaluation context. This agent ap-
proach gives also a novel definition of the quantum uncertainty and quantum computations for quantum
gates that are realized by unitary transformations of the state. In the AUT approach, unitary matrices
are interpreted as logic operations in logic computations. The authors show that by using permutation
operators any type of complex classical logic expression can be generated. With the quantum gate, the
authors introduce classical logic into the quantum domain. This chapter connects the intrinsic irratio-
nality of the quantum system and the non-classical quantum logic with the agents. The authors argue
that AUT can help to find meaning for quantum superposition of non-consistent states. Next, this chap-
ter shows that the neural fusion at the synapse can be modeled by the AUT in the same fashion. The
neuron is modeled as an operator that transforms classical logic expressions into many-valued logic
expressions. The motivation for such neural network is to provide high flexibility and logic adaptation
of the brain model.
Section 3
Bio-Inspired Agent-Based Artificial Markets
Chapter 5
Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-Based
Double Auction Markets........................................................................................................................ 78
Shu-Heng Chen, National Chengchi University, Taiwan
Ren-Jie Zeng, Taiwan Institute of Economic Research, Taiwan
Tina Yu, Memorial University of Newfoundland, Canada
Shu G. Wang, National Chengchi University, Taiwan
This chapter investigates the dynamics of trader behaviors using an agent-based genetic programming
system to simulate double-auction markets. The objective of this study is two-fold. First, the authors
seek to evaluate how, if any, the difference in trader rationality/intelligence influences trading behav-
ior. Second, besides rationality, they also analyze how, if any, the co-evolution between two learnable
traders impacts their trading behaviors. The authors have found that traders with different degrees of
rationality may exhibit different behavior depending on the type of market they are in. When the market
has a profit zone to explore, the more intelligent trader demonstrates more intelligent behaviors. Also,
when the market has two learnable buyers, their co-evolution produced more profitable transactions
than when there was only one learnable buyer in the market. The authors have analyzed the trading
strategies and found the learning behaviors are very similar to humans in decision-making. They plan
to conduct human subject experiments to validate these results in the near future.
Chapter 6
Social Simulation with Both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior............................................................ 95
Shu-Heng Chen, National Chengchi University, Taiwan
Chung-Ching Tai, Tunghai University, Taiwan
Tzai-Der Wang, Cheng Shiu University, Taiwan
Shu G. Wang, National Chengchi University, Taiwan
This chapter presents agent-based simulations as well as human experiments in double auction markets.
The authors’ idea is to investigate the learning capabilities of human traders by studying learning agents
constructed by Genetic Programming (GP), and the latter can further serve as a design platform in
conducting human experiments. By manipulating the population size of GP traders, the authors attempt
to characterize the innate heterogeneity in human being’s intellectual abilities. They find that GP trad-
ers are efficient in the sense that they can beat other trading strategies even with very limited learning
capacity. A series of human experiments and multi-agent simulations are conducted and compared for
an examination at the end of this chapter.
Chapter 7
Evolution of Agents in a Simple Artificial Market.............................................................................. 118
Hiroshi Sato, National Defense Academy, Japan
Masao Kubo, National Defense Academy, Japan
Akira Namatame, National Defense Academy, Japan
This chapter conducts a comparative study of various traders following different trading strategies. The
authors design an agent-based artificial stock market consisting of two opposing types of traders: “ra-
tional traders” (or “fundamentalists”) and “imitators” (or “chartists”). Rational traders trade by trying to
optimize their short-term income. On the other hand, imitators trade by copying the majority behavior
of rational traders. The authors obtain the wealth distribution for different fractions of rational traders
and imitators. When rational traders are in the minority, they can come to dominate imitators in terms
of accumulated wealth. On the other hand, when rational traders are in the majority and imitators are
in the minority, imitators can come to dominate rational traders in terms of accumulated wealth. The
authors show that survival in a finance market is a kind of minority game in behavioral types, rational
traders and imitators. The coexistence of rational traders and imitators in different combinations may
explain the market’s complex behavior as well as the success or failure of various trading strategies.
Chapter 8
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets . .................. 134
Hiroshi Takahashi, Keio University, Japan
Takao Terano, Tokyo Institute of Technology, Japan
This chapter describes advances of agent-based models to financial market analyses based on the au-
thors’ recent research. The authors have developed several agent-based models to analyze microscopic
and macroscopic links between investor behaviors and price fluctuations in a financial market. The
models are characterized by the methodology that analyzes the relations among micro-level decision
making rules of the agents and macro-level social behaviors via computer simulations. In this chap-
ter, the authors report the outline of recent results of their analysis. From the extensive analyses, they
have found that (1) investors’ overconfidence behaviors plays various roles in a financial market, (2)
overconfident investors emerge in a bottom-up fashion in the market, (3) they contribute to the efficient
trades in the market, which adequately reflects fundamental values, (4) the passive investment strategy
is valid in a realistic efficient market, however, it could have bad influences such as instability of mar-
ket and inadequate asset pricing deviations, and (5) under certain assumptions, the passive investment
strategy and active investment strategy could coexist in a financial market.
Section 4
Multi-Agent Robotics
Chapter 9
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks................... 156
Masanori Goka, Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura, Hiroshima University, Japan
Artificial evolution has been considered as a promising approach for coordinating the controller of
an autonomous mobile robot. However, it is not yet established whether artificial evolution is also ef-
fective in generating collective behaviour in a multi-robot system (MRS). In this study, two types of
evolving artificial neural networks are utilized in an MRS. The first is the evolving continuous time re-
current neural network, which is used in the most conventional method, and the second is the topology
and weight evolving artificial neural networks, which is used in the noble method. Several computer
simulations are conducted in order to examine how the artificial evolution can be used to coordinate the
collective behaviour in an MRS.
Chapter 10
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering........................................ 174
Yasushi Kambayashi, Nippon Institute of Technology, Japan
Yasuhiro Tsujimura, Nippon Institute of Technology, Japan
Hidemi Yamachi, Nippon Institute of Technology, Japan
Munehiro Takimoto, Tokyo University of Science, Japan
This chapter presents a framework using novel methods for controlling mobile multiple robots directed
by mobile agents on a communication networks. Instead of physical movement of multiple robots,
mobile software agents migrate from one robot to another so that the robots more efficiently complete
their task. In some applications, it is desirable that multiple robots draw themselves together automati-
cally. In order to avoid excessive energy consumption, the authors employ mobile software agents to
locate robots scattered in a field, and cause them to autonomously determine their moving behaviors
by using a clustering algorithm based on the Ant Colony Optimization (ACO) method. ACO is the
swarm-intelligence-based method that exploits artificial stigmergy for the solution of combinatorial
optimization problems. Preliminary experiments have provided a favorable result. Even though there
is much room to improve the collaboration of multiple agents and ACO, the current results suggest a
promising direction for the design of control mechanisms for multi-robot systems. This chapter focuses
on the implementation of the controlling mechanism of the multi-robot system using mobile agents.
Section 5
Multi-Agent Games and Simulations
Chapter 11
The AGILE Design of Reality Games AI............................................................................................ 193
Robert G. Reynolds, Wayne State University, USA
John O’Shea, University of Michigan-Ann Arbor, USA
Xiangdong Che, Wayne State University, USA
Yousof Gawasmeh, Wayne State University, USA
Guy Meadows, University of Michigan-Ann Arbor, USA
Farshad Fotouhi, Wayne State University, USA
This chapter investigates the use of agile program design techniques within an online game develop-
ment laboratory setting. The proposed game concerns the prediction of early Paleo-Indian hunting sites
in ancient North America along a now submerged land bridge that extended between Canada and the
United States across what is now Lake Huron. While the survey of the submerged land bridge was be-
ing conducted, the online class was developing a computer game that would allow scientists to predict
where sites might be located on the landscape. Crucial to this was the ability to add in gradually dif-
ferent levels of cognitive and decision-making capabilities for the agents. The authors argue that the
online component of the courses was critical to supporting an agile approach here. The results of the
study indeed provided a fusion of both survey and strategic information that suggest that movement of
caribou was asymmetric over the landscape. Therefore, the actual positioning of human artifacts such
as hunting blinds was designed to exploit caribou migration in the fall, as is observed today.
Chapter 12
Management of Distributed Energy Resources Using Intelligent Multi-Agent System...................... 208
Thillainathan Logenthiran, National University of Singapore, Singapore
Dipti Srinivasan, National University of Singapore, Singapore
The technology of intelligent Multi-Agent System (MAS) has radically altered the way in which com-
plex, distributed, open systems are conceptualized. This chapter presents the application of multi-agent
technology to design and deployment of a distributed, cross platform, secure multi-agent framework
to model a restructured energy market, where multi players dynamically interact with each other to
achieve mutually satisfying outcomes. Apart from the security implementations, some of the best prac-
tices in Artificial Intelligence (AI) techniques were employed in the agent oriented programming to
deliver customized, powerful, intelligent, distributed application software which simulates the new
restructured energy market. The AI algorithm implemented as a rule-based system yielded accurate
market outcomes.
Section 6
Multi-Agent Learning
Chapter 13
Effects of Shaping a Reward on Multiagent Reinforcement Learning................................................ 232
Sachiyo Arai, Chiba University, Japan
The multiagent reinforcement learnig approach is now widely applied to cause agents to behave ratio-
nally in a multiagent system. However, due to the complex interactions in a multiagent domain, it is
difficult to decide the each agent’s fair share of the reward for contributing to the goal achievement.
This chapter reviews a reward shaping problem that defines when and what amount of reward should
be given to agents. The author employs keepaway soccer as a typical multiagent continuing task that
requires skilled collaboration between the agents. Shaping the reward structure for this domain is diffi-
cult for the following reasons: i) a continuing task such as keepaway soccer has no explicit goal, and so
it is hard to determine when a reward should be given to the agents, ii) in such a multiagent cooperative
task, it is difficult to fairly share the reward for each agent’s contribution. Through experiments, this
chapter finds that reward shaping has a major effect on an agent’s behavior.
Chapter 14
Swarm Intelligence Based Reputation Model for Open Multiagent Systems..................................... 248
Saba Mahmood, School of Electrical Engineering and Computer Science
(NUST-SEECS), Pakistan
Azzam ul Asar, Department of Electrical and Electronics Eng NWFP
University of Engineering and Technology, Pakistan
Hiroki Suguri, Miyagi University, Japan
Hafiz Farooq Ahmad, School of Electrical Engineering and Computer Science
(NUST-SEECS), Pakistan
In open multiagent systems, individual components act in an autonomous and uncertain manner, thus
making it difficult for the participating agents to interact with one another in a reliable environment.
Trust models have been devised that can create level of certainty for the interacting agents. However,
trust requires reputation information that basically incorporates an agent’s former behaviour. There
are two aspects of a reputation model i.e. reputation creation and its distribution. Dissemination of
this reputation information in highly dynamic environment is an issue and needs attention for a better
approach. The authors have proposed a swarm intelligence based mechanism whose self-organizing
behaviour not only provides an efficient way of reputation distribution but also involves various sources
of information to compute the reputation value of the participating agents. They have evaluated their
system with the help of a simulation showing utility gain of agents utilizing swarm based reputation
system.
Chapter 15
Exploitation-Oriented Learning XoL - A New Approach to Machine Learning
Based on Trial-and-Error Searches...................................................................................................... 267
Kazuteru Miyazaki, National Institution for Academic Degrees
and University Evaluation, Japan
Exploitation-oriented Learning XoL is a new framework of reinforcement learning. XoL aims to learn
a rational policy whose expected reward per an action is larger than zero, and does not require a so-
phisticated design of the value of a reward signal. In this chapter, as examples of learning systems that
belongs in XoL, the authors introduce the rationality theorem of profit Sharing (PS), the rationality the-
orem of reward sharing in multi-agent PS, and PS-r*. XoL has several features. (1) Though traditional
RL systems require appropriate reward and penalty values, XoL only requires an order of importance
among them. (2) XoL can learn more quickly since it traces successful experiences very strongly. (3)
XoL may be unsuitable for pursuing an optimal policy. The optimal policy can be acquired by the multi-
start method that needs to reset all memories to get a better policy. (4) XoL is effective on the classes
beyond MDPs, since it is a Bellman-free method that does not depend on DP. The authors show several
numerical examples to confirm these features.
Section 7
Miscellaneous
Chapter 16
Pheromone-Style Communication for Swarm Intelligence................................................................. 294
Hidenori Kawamura, Hokkaido University, Japan
Keiji Suzuki, Hokkaido University, Japan
Pheromones are the important chemical substances for social insects to realize cooperative collective
behavior. The most famous example of pheromone-based behavior is foraging. Real ants use phero-
mone trail to inform each other where food source exists and they effectively reach and forage the food.
This sophisticated but simple communication method is useful to design artificial multiagent systems.
In this chapter, the evolutionary pheromone communication is proposed on a competitive ant environ-
ment model, and the authors show two patterns of pheromone communication emerged through co-
evolutionary process by genetic algorithm. In addition, such communication patterns are investigated
with Shannon’s entropy.
Chapter 17
Evolutionary Search for Cellular Automata with Self-Organizing Properties
toward Controlling Decentralized Pervasive Systems......................................................................... 308
Yusuke Iwase, Nagoya University, Japan
Reiji Suzuki, Nagoya University, Japan
Takaya Arita, Nagoya University, Japan
Cellular Automata (CAs) have been investigated extensively as abstract models of the decentralized
systems composed of autonomous entities characterized by local interactions. However, it is poorly
understood how CAs can interact with their external environment, which would be useful for imple-
menting decentralized pervasive systems that consist of billions of components (nodes, sensors, etc.)
distributed in our everyday environments. This chapter focuses on the emergent properties of CAs
induced by external perturbations toward controlling decentralized pervasive systems. The authors as-
sumed a minimum task in which a CA has to change its global state drastically after every occurrence
of a perturbation period. In the perturbation period, each cell state is modified by using an external rule
with a small probability. By conducting evolutionary searches for rules of CAs, the uathors obtained in-
teresting behaviors of CAs in which their global state cyclically transited among different stable states
in either ascending or descending order. The self-organizing behaviors are due to the clusters of cell
states that dynamically grow through occurrences of perturbation periods. These results imply that the
global behaviors of decentralized systems can be dynamically controlled by states of randomly selected
components only.
Index.................................................................................................................................................... 349
xvi
Preface
ABSTRACT
From a historical viewpoint, the development of multi-agent systems demonstrates how computer sci-
ence has become more social, and how the social sciences have become more computational. With this
development of cross-fertilization, our understanding of multi-agent systems may become partial if we
only focus on computer science or only focus on the social sciences. This book with its 17 chapters
intends to give a balanced sketch of the research frontiers of multi-agent systems. We trace the origins
of the idea, a biologically-inspired approach to multi-agent systems, to John von Neumann, and then
continue his legacy in this volume.
1. GENERAL BACKGROUND
Multi-agent system (MAS) is now an independent, but highly interdisciplinary, scientific subject. It
offers scientists a new research paradigm to study the existing complex natural systems, to understand
the underlying mechanisms by simulating them, and to gain the inspiration to design artificial systems
that can solve highly complex (difficult) problems or can create commercial value. From a historical
viewpoint, the development of multi-agent systems itself demonstrates how computer science has be-
come more social, and, in the meantime, how the social sciences have become more computational.
With this development of cross-fertilization, our understanding of multi-agent systems may become
partial if we only focus on computer science or only focus on the social sciences. A balanced view is
therefore desirable and becomes the main pursuit of this editing volume. In this volume, we attempt to
give a balanced sketch of the research frontiers of multi-agent systems, ranging from computer science
to the social sciences.
While there are many intellectual origins of the MAS, the book “Theory of Self-Reproducing Au-
tomata” by von Neumann (1903-1957) certainly contributes to a significant part of the later development
of MAS (von Neumann, 1966). In particular, it contributes to a special class of MAS, called cellular
automata, which motivates a number of pioneering applications of MAS to the social sciences in the
early 1970s (Albin, 1975). In this book, von Neumann suggested that an appropriate principle for de-
signing artificial automata can be productively inspired by the study of natural automata. Von Neumann
himself spent a great deal of time on the comparative study of the nervous systems or the brain (the
natural automata) and the digital computer (the artificial automata). In his book “The Computer and the
xvii
Brain”, von Neumann demonstrates the effect of interaction between the study of natural automata and
the design of artificial automata.
This biologically-inspired principle has been further extended by Arthur Burks, John Holland and
many others. By following this legacy, this volume has this biologically-inspired approach to multi-agent
systems as its focus. The difference is that we are now richly endowed with more natural observations
for inspirations, from evolutionary biology, and neuroscience, to ethology and entomology. The main
purpose of this book is to ground the design of multi-agent systems in biologically-inspired tools, such
as evolutionary computation, artificial neural networks, reinforcement learning, swarm intelligence,
stigmergic optimization, ant colony optimization, and ant colony clustering.
Given the two well-articulated goals above, this volume covers six subjects, which of course are not
exhaustive but are sufficiently representative of the current important developments of MAS and, in
the meantime, point to the directions for the future. The six subjects are multi-agent financial decision
systems (Chapters 1-2), neuro-inspired agents (Chapters 3-4), bio-inspired agent-based financial markets
(Chapters 5-8), multi-agent robots (Chapters 9-10), multi-agent games and simulation (Chapters 11-12),
and multi-agent learning (Chapters 13-15). 15 contributions to this volume are grouped by these subjects
into six section of the volume. In addition to these six sections, a “miscellaneous” sectiont is added to
include two contributions, each of which addresses an important dimension of the development of MAS.
In the following, we would like to give a brief introduction to each of these six subjects.
We start with the multi-agent financial system. The idea of using multi-agent systems to process infor-
mation has a long tradition in economics, even though in early days the term MAS did not even exist.
In this regard, Hayek (1945) is an influential work. Hayek considered the market and the associated
price mechanism as a way of pooling or aggregating the market participants’ limited knowledge of the
economy. While the information owned by each market participant is imperfect, the pool of them can
generate prices with any efficient allocation of resources. The assertion of this article was later on coined
as the Hayek Hypothesis by Vernon Smith (Smith 1982) in his double auction market experiments. The
intensive study of the Hayek hypothesis in experimental economics has further motivated or strengthened
the idea of prediction markets. A prediction market essentially generates an artificial market environ-
ment such that forecasts of crowds can be pooled so as to generate better forecasts. Predicting election
outcomes via what is known as political future markets becomes one of the most prominent applications.
On the other hand, econometricians tend to pool the forecasts made by different forecasting models
so as to improve their forecasting performance. In one literature, this is known as the combined forecasts
(Clement 1989). Like prediction markets, combined forecasts tend to enhance the forecast accuracy.
The difference between prediction markets and combined forecasts is that agents in the former case are
heterogeneous in both data (the information acquired) and models (the way to process information),
whereas agents in the latter case are heterogeneous in models only. Hybrid systems in machine learning
or artificial intelligence can be regarded as a further extension of the combined forecasts, for example,
Kooths, Mitze, and Ringhut (2004). Their difference lies in the way they integrate the intelligence of the
crowd. Integration in the case of a combined forecast is much simpler, most of the time, consisting of just
the weighted combination of forecasts made by different agents. This type of integration can function well
xviii
because the market price under certain circumstances is just this simple linear combination of a pool of
forecasts. This latter property has been shown by the recent agent-based financial markets. Nevertheless,
the hybrid system is more sophisticated in terms of its integration. It is not just the horizontal combina-
tion of the pool, but also involves the vertical integration of it. In this way, heterogeneous agents do not
just behave independently, but work together as a team (Mumford and Jain, 2009).
Chapter 1 “A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index” authored
by Mak Kaboudan provides an illustration of the hybrid systems. He provides an agent-based forecasting
system of real estate. The system is composed of three types of agents, namely, artificial neural net-
works, genetic programming and linear regression. The system “aggregates” the dispersed forecasts of
these agents through a competition-cooperation cyclic phase. In the competition phase, best individual
forecasting models are chosen from each type of agent. In the cooperation phase, hybrid systems (rec-
onciliatory models) are constructed by combining artificial neural networks with genetic programming,
or by combining artificial neural networks with regression models, based on the solutions of the first
phase. Finally, there is a competition again for individual models and reconciliatory models.
Chapter 2 “An Agent-based Model for Portfolio Optimization Using Search Space Splitting” authored
by Yukiko Orito, Yasushi Kambayashi, Yasuhiro Tsujimura and Hisashi Yamamoto proposes a novel ver-
sion of genetic algorithms to solve the portfolio optimization problem. Genetic algorithms are population-
based search algorithms; hence, they can naturally be considered to be an agent-based approach, if we
treat each individual in the population as an agent. In Orito et al.’s case, each agent is an investor with
a portfolio over a set of assets. However, the authors do not use the standard single-population genetic
algorithm to drive the evolutionary dynamics of the portfolios. Instead, the whole society is divided into
many sub-populations (clusters of investors), within each of which there is a leader. The interactions of
agents are determined by their associated behavioral characteristics, such as leaders, obedient followers
or disobedient followers. These clusters and behavioral characteristics can constantly change during
the evolution: new leaders with new clusters may emerge to replace the exiting ones. Like the previous
chapter, this chapter shows that the wisdom of crowds emerges from complex social dynamics rather
than just a static weighted combination.
3. NEURO-INSPIRED AGENTS
Our brain itself is a multi-agent system; therefore, it is natural to study the brain as a multi-agent system
(de Garis 2008). In this direction, MAS is applied to neuroscience. However, the other direction also
exists. One recent development in multi-agent systems is to make software agents more human like.
Various human factors, such as cognitive capacity, intelligence, personality attributes, emotion, and
cultural differences, have become new working dimensions for software agents. Since these human
factors have now been intensively studied in neuroscience with regard to their neural correlates, it is not
surprising to see that the design of autonomous agents, under this influence, will be grounded deeper
into neuroscience. Hence, the progress of neuroscience can impact the design of autonomous agents in
MAS. The next two chapters are written to feature this future.
Chapter 3 “Neuroeconomics: A Viewpoint from Agent-Based Computational Economics” by Shu-
Heng Chen and Shu G. Wang gives a review of how the recent progress in neuroeconomics may shed
light on different components of autonomous agents, including their preference formation, alternatives
valuation, choice making, risk perception, risk preferences, choice making under risk, and learning. The
xix
last part of their review covers the well-known dual system conjecture, which is now the centerpiece
of neuroeconomic theory.
Chapter 4 “Agents in Quantum and Neural Uncertainty” authored by Germano Resconi and Boris
Kovalerchuk raises a very fundamental issue: does our brain fuzzify the received signals, even when
they are presented in a crispy way? They then further inquire into the nature of uncertainty and propose
a notion of uncertainty which is neural theoretic. A two-layered neural network is proposed to be able
to transform crisp signals into multi-valued outputs (fuzzy outputs). In this way, the source of fuzziness
comes from the conflicting evaluations of the same inputs made by different neurons, to some extent,
like Minsky’s society of minds (Minsky, 1998). Using various brain image technologies, the current
study of neuroscience has already explored various neural correlates when subjects are presented with
vague, incomplete and inconsistent information. This mounting evidence may put the modal logic under
a close examination and motivate us to think about some alternatives, like dynamic logic.
The third subject of this volume is bio-inspired agent-based artificial markets. Market is another natural
demonstration of multi-agent systems. In fact, over the last decade, the market mechanism has inspired
the design of MAS, known as the market-based algorithm. To some extent, it has also revolutionized the
research paradigm of artificial intelligence by motivating the distributed AI. However, in a reverse direc-
tion, MAS also provides economists with a powerful tool to explore and to test the market mechanism.
This research helps them to learn when markets may fail and hence learn how to do market designs.
Nevertheless, the function of markets is not just about the institutional design (the so-called structur-
alism); a significant number of studies of artificial markets have found that institutional design is not
behavior-free or culture-free. This behavioral awareness and cultural awareness has now also become a
research direction in experimental economics and agent-based computational economics.
The four chapters contributing to this section all adopt a behavioral approach to the study of artificial
markets. Chapter 5 “Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-
Based Double Auction Markets” authored by Shu-Heng Chen, Ren-Jie Zeng, Tina Yu and Shu G Wang
can be read as an example of the recent attempt to model agents with different cognitive capacities or
intelligence. It is clear that human agents are heterogeneous in their cognitive capacity (intelligence),
and the effect of this heterogeneity on their economic and social status has been found in many recent
studies ranging from psychology and sociology to economics; nevertheless, conventional agent-based
models paid little attention to this development, and in most cases agents were explicitly or implicitly
assumed to be equally smart. By using genetic programming parameterized with different population
sizes, this chapter provides a pioneering study to examine the effect of cognitive capacity on the dis-
covery of trading strategies. It is found that larger cognitive capacity can contribute to the discovery
of more complex but more profitable strategies. It is also found that different cognitive capacity may
coordinate different matches of strategies of players in a co-evolutionary fashion, while they are not
necessarily the Nash equilibria.
Chapter 6 “Social Simulation with both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior” authored by Shu-Heng Chen, Chung-
Ching Tai, Tzai-Der Wang and Shu G Wang. This chapter can be considered to be a continuation of the
cognitive agent-based models. What differs from the previous one is that this chapter considers not only
xx
software agents with different cognitive capacity which is manipulated in the same way as in the previ-
ous chapter, but also considers human agents with different working memory capacity. A test borrowed
from psychology is employed to measure the working memory capacity of human subjects. By placing
software agents and human agents separately in a similar environment (double auction markets, in this
case) to play against the same group of opponents (Santa Fe program agents), they are able to examine
whether the economic significance of intelligence observed from human agents can be comparable to
that observed in the software agents, and hence to evaluate how well the artificial cognitive capacity has
mimicked the human cognitive capacity.
Chapter 7 “Evolution of Agents in a Simple Artificial Market” authored by Hiroshi Sato, Masao Kubo
and Akira Namatame is a work devoted to the piling-up literature on agent-based artificial stock markets.
As Chen, Chang and Du (2010) have surveyed, from the viewpoint of agent engineering, there are two
major classes of agent-based artificial stock markets. One comprises the H-type agent-based financial
models, and the other, the Santa-Fe-like agent-based financial models. The former has the agents whose
behavioral rules are known and, to some extent, are fixed and simple. The latter has the agents who are
basically autonomous, and their behavior, in general, can be quite complex. This chapter belongs to the
former, and considers two types of agents: rational investors and imitators. It uses the standard stochastic
utility function as the basis for deriving the Gibbs-Boltzmann distribution as the learning mechanism of
agents and shows the evolving microstructure (fraction) of these two types of agents and its connection
to the complex dynamics of financial markets.
Chapter 8 “Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets”
authored by Hiroshi Takahashi and Takao Terano is another contribution to agent-based artificial stock
markets. It shares some similarities with the previous chapter; mainly, they both belong to the H-type
agent-based financial markets, categorized in Chen, Chang and Du (2010). However, this chapter distin-
guishes itself by incorporating the ingredients of behavioral finance into agent-based financial models,
a research trend perceived in Chen and Liao (2004). Specifically, this chapter considers passive and
active investors, overconfident investors, and prospects-based investors (Kahneman-Tversky inves-
tors). Within this framework, the authors address two frequently-raised issues in the literature. The first
one is the issue pertaining to survival analysis: among different types of agents, who can survive, and
under what circumstances? The second issue pertains to the traceability of the fundamental prices by
the market price: how far and for how long can the market price deviate from the fundamental price.
Their results and many others in the literature seem to indicate that the inclusion of behavioral factors
can quite strongly and persistently cause the market price to deviate from the fundamental price, and
that persistent deviation can exist even after allowing agents to learn.
5 MUTLI-AGENT ROBOTICS
Section 4 comes to one of the most prominent applications of multi-agent systems, i.e., multi-agent robot-
ics. RoboCup (robotic soccer games) which was initiated in the year 1997 provides one of the exemplary
cases (Kitano, 1998). In this case, one has to build a team of agents that can play a soccer game against
a team of robotic opponents. The motivation of RoboCup is that playing soccer successfully demands
a range of different skills, such as real-time dynamic coordination using limited communication band-
width. Obviously, a formidable task in this research area is how to coordinate these autonomous agents
(robots) coherently so that a common goal can be achieved. This requires each autonomous robot to
xxi
follow a set of behavioral rules, and when they are placed in a distributed interacting environment, the
individual operation of these rules can collectively generate a desirable pattern. This issue is so basic
that it already exists in the very beginning of MAS, such as pattern formation in cellular automata. The
simple cellular automata are homogeneous in the sense that all automata follow the same set of rules,
and there is a mapping from these sets of rules to the emergent patterns. Wolfram (2002) has worked
this out in quite some detail.
Multi-robot systems can be considered to be an extension of the simple cellular automata. The issue
pursued here is an inverse engineering problem. Instead of asking what pattern emerges given a set of
rules, we are now asking what set of rules are required to generate certain kinds of patterns. This is the
coordination problem for not only multi-agent robots but also other kinds of MAS. Given the complex
structure of this problem, it is not surprising to see that evolutionary computation has been applied to
tackle this issue. In this part, we shall see two such studies.
Chapter 9 “Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks”
authored by Masanori Goka and Kazuhiro Ohkura gives a concrete coordination problem for robots.
Ten autonomous mobile robots have to push three packages to the goal line. Each of these autonomous
robots is designed with a continuous-time recurrent artificial neural network. The coordination of them
is solved using evolutionary strategies and genetic algorithms. In the former case, the network structure
is fixed and only the connection weights evolve; in the latter case, the network structure is also evolved
with the connection weights. It has been shown that in the latter case and in the later stage, the team of
robots develops a kind of autonomous specialization, which divides the entire team into three sub-teams
to take care of each of the three packages separately.
Chapter 10 “A Multi-Robot System Using Mobile Agents with Ant Colony Clustering” authored by
Yasushi Kambayashi, Yasuhiro Tusjimura, Hidemi Yamachi, and Munehiro Takimoto presents another
coordination problem of the multi-robot systems. In their case, the robots are the luggage carts used
in the airports. These carts are picked up by travelers at designated points and left in arbitrary places.
They are then collected by man one by one, which is very laborious. Therefore, an intelligent design
is concerned with how these carts can draw themselves together at designated points, and how these
gathering places are determined. The authors apply the idea of mobile agents in this study. Mobile agents
are programs that can transmit themselves across an electronic network and recommence execution at
a remote site (Cockayne and Zyda, 1998). In this chapter, mobile agents are employed as the medium
before the host computer (a simulating agent) and all these scattered carts via the device of RFID (Radio
Frequency Identification). The mobile software agent will first collect information with regard to the
initial distribution of these luggage carts, and this information will be sent back to the host computer,
which will then use ant colony clustering, an idea motivated by ant corps gathering and brood sorting
behavior, to figure out the places to which these carts should return. The designated place for each cart
is then transmitted to each cart again via the mobile software agent.
The two chapters in this part are in interesting sharp contrast. The former involves the physical
movement of robots during the coordination process, whereas the latter does not involve physical move-
ment until the coordination problem has been solved via the simulation. In addition, the former uses
the bottom-up (decentralized) approach to cope with the coordination problem, whereas the latter uses
the top-down (centralized) approach to cope with the coordination problem, even though the employed
ant colony clustering itself is decentralized in nature. It has been argued that the distributed system can
coordinate itself well, for example, in the well-known El Farol problem (Arthur, 1994). In the intelligent
transportation system, it has also been proposed that a software driver be designed that can learn and
xxii
can assist the human drives to avoid traffic routes, if these software drives can be properly coordinated
first (Sasaki, Flann and Box, 2005). Certainly, this development may continue and, after reading these
two chapters, readers may be motivated to explore more on their own.
The analysis of social group dynamics through gaming for sharing of understanding, problem solving
and education can be closely tied to MAS. This idea has been freshly demonstrated in Arai, Deguchi and
Matshi (2006). In this volume, we include two chapters contributing to gaming simulation.
Chapter 11 “Agile Design of Reality Games Online” authored by Robert Reynolds, John O’Shea,
Farshad Fotouhi, James Fogarty, Kevin Vitale and Guy Meadows is a contribution to the design of on-
line games. The authors introduce agile programming as an alternative to the conventional waterfall
model. In the waterfall model, the software development goes through a sequential process, which
demands that every phase of the project be completed before the next phase can begin. Yet, very little
communication occurs during the hand-offs between the specialized groups responsible for each phase
of development. Hence, when a waterfall project wraps, this heads-down style of programming may
create product that is not actually what the customer want. The agile-programming is then proposed as
an alternative to help software development teams react to the instability of building software through
an incremental and iterative work cycle, which is detailed in this chapter. The chapter then shows how
this incremental and iterative work cycle has been applied to develop an agent-based hunter-deer-wolf
game. In these cases, agents are individual hunters, deer, and wolves. Each of these individuals can work
on his own, but each of them also belongs to a group, a herd or a pack so that they may learn socially.
Social intelligence (swarm intelligence) can, therefore, be placed into this game; for example, agents
can learn via cultural algorithms (Reynolds, 1994, 1999). The results of these games are provided to a
group of archaeologists as an inspiration for their search for human activity evidences in ancient times.
Chapter 12 “Management of Distributed Energy Resources Using Intelligent Multi-Agent System”
authored by T Logenthiran and Dipti Srinivasan is a contribution to the young but rapidly-growing
literature on the agent-based modeling of electric power markets (Weidlich, 2008). The emergence of
this research area is closely related to the recent trend of deregulating electricity markets, which may
introduce competition to each constituent of the originally vertically-integrated industry, from generation,
transmission to distribution. Hence, not surprisingly, multi-agent systems have been applied to model
the competitive ecology of this industry. Unlike those chapters in Part III, this chapter is not direct in-
volved in the competitive behavior of buyers and sellers in the electricity markets. Instead, it provides
an in-depth description of the development of the simulation software for electricity markets. It clearly
specifies each agent, in addition to the power-generating companies and consumers, of electricity mar-
kets. Then they show how this knowledge can be functionally integrated into simulation software using
a multi-agent platform, such as JADE (Java Agent DEvelopment Framework).
xxiii
7 MULTI-AGENT LEARNING
The sixth subject of the book is about learning in the context of MAS. Since the publication of Bush
and Mosteller (1955) and Luce (1959), reinforcement learning is no longer just a subject of psychology
itself, but is proved to be important for many other disciplines, such as economics and games. Since
the seminal work of Samuel (1959) on the checkers playing program, the application of reinforcement
learning to games is already 50 years on. The recent influential work by Sutton and Barro (1998) has
further pushed these ideas so that they are being widely used in artificial intelligence and control theory.
The advancement of various brain-image technologies, such as fMRI and positron emission tomogra-
phy, has enabled us to see how our brain has the built-in mechanism required for the implementation
of reinforcement learning. The description of reinforcement learning systems actually matches the
behavior of specific neural systems in the mammalian brain. One of the most important such systems is
the dopamine system and the role that it plays in learning about rewards and directing our choices that
lead us to rewards (Dow, 2003; Montague, 2007).
However, like other multi-disciplinary development, challenging issues also exist in reinforcement
learning. A long-lasting fundamental issue is the design or the determination of the reward function, i.e.,
reward as a function of state and action. Chapter 13 “Effects of Shaping a Reward on Multiagent Rein-
forcement Learning” by Sachiyo Arai and Nobuyuki Tanaka addresses two situations which may make
reward function exceedingly difficult to design. In the first case, the task is constantly going on and it is
not clear when to reward. The second case involves the learning of team members as a whole instead of
individually. To make the team achieve its common goal, it may not be desirable to distribute the rewards
evenly among team members, but the situation can be worse if the rewards are not properly attributed
to the few deserving individuals. Arai and Tanaka address these two issues in the context of keepaway
soccer, in which a team tries to maintain ball possession by avoiding the opponent’s interceptions.
Trust has constantly been a heated issue in multi-agent systems. This is so because in many situations
agents have to decide with whom they want to interact and what strategies to use. By all means, they
want to be able to manage the risk of interacting with malicious agents. Hence, evaluating the trustwor-
thiness of “strangers” become crucial. People in daily life would be willing to invest to gain informa-
tion to deal with this uncertainty. Various social systems, such as rating agencies, social networks, etc.,
have been constructed so as to facilitate the acquiring of the reputations of agents. Chapter 14 “Swarm
Intelligence Based Reputation Model for Open Multiagent Systems” by Saba Mahmood, Assam Asar,
Hiroki Suguri and Hafiz Ahmad deals with the dissemination of updated reputations of agents. After
reviewing the existing reputation models (both centralized and decentralized ones), the authors propose
their construction using ant colony optimization.
Chapter 15 “Exploitation-oriented Learning XoL: A New Approach to Machine Learning Based on
Trial-and-Error Searches” by Kazuteru Miyazaki is also a contribution to reinforcement learning. As
we have said earlier, a fundamental challenge for reinforcement learning is the design of the reward
function. In this chapter, Miyazaki proposes a novel version of reinforcement learning based on many
of his earlier works on the rationality theorem of profit sharing. This new version, called XoL, differs
from the usual one in that reward signals only require an order of importance among the actions, which
facilitates the reward design. In addition, XoL is a Bellman-free method since it can work on the classes
beyond Markov decision processes. XoL can also learn fast because it traces successful experiences very
strongly. While the resultant solution can be biased, a cure is available through the multi-start method
proposed by the author.
xxiv
8 MISCELLANEOUS
The last part of the book has “Miscellaneous” as its title. There are two chapters in the part. While these
two chapters can be related to and re-classified into some of the previous parts, we prefer to make them
“stand out” here to not blur their unique coverage. The first one in this part (Chapter 16) is related to the
multi-agent robotics and also to multi-agent learning, but it is the only chapter devoted to the simulation
of the behavior of insects, namely, the ant war. It is an application of MAS to entomology or computa-
tional entomology, and a biologically inspired approach that is put back to the study of biology. The last
chapter of the book (Chapter 17) is devoted to cellular automata, an idea widely shared in many other
chapters of the book, but it is the only chapter which exclusively deals with this subject with an in-depth
review. As we have mentioned earlier, one of the origins of the multi-agent systems is von Neumann’s
cellular automata. It will indeed be aesthetic if the whole book has the most recent developments on
this subject as its closing chapter.
Chapter 16 “Pheromone-Style Communication for Swarm Intelligence” authored by Hidenori Kawamura
and Keiji Suzuki simulates two teams of ants competing for food. What concerns the authors is how
ants effectively communicate with their teammates so that the food collected can be maximized. In a
sense, this chapter is similar to the coordination problems observed in RoboCup. The difference is that
insects like ants or termites are cognitively even more limited than robots in RoboCup. Their decisions
and actions are rather random, which requires no memory, no prior knowledge, and does not involve
learning in an explicit way. Individually speaking, they are comparable to what is known to economists
as zero-intelligent agents (Gode and Sunder, 1993). Yet, entomologists have found that they can com-
municate well. The communication is however not necessarily direct, but more indirect, partially due to
their poor visibility. Their reliance on indirect communication has been noticed by the French biologists
Pierre-Paul Grasse (1895-1985), and he termed this style of communication or interaction stigmergy
(Grosan and Abraham, 2006)
He defined stigmergy as: “Stimulation of workers by the performance they have achieved.” Stigmergy
is a method of communication in which the individuals communicate with each another via modify-
ing their local environment. The price mechanism familiar to economists is an example of stigmergy.
It does not require market participants to have direct interaction, but only indirect interaction via price
signals. In this case the environment is characterized as the price, which is constantly changed by market
participants and hence constantly invites others to take actions further.
In this chapter, Kawamura and Suzuki use genetic algorithms to simulate the co-evolution processes of
the emergent stigmergic communication among ants. While this study is specifically placed in a context
of an ant war, it should not be hard to see its potential in a more general context, such as the evolution
of language, norms and culture.
In Chapter 17 “Evolutionary Search for Cellular Automata with Self-Organizing Properties toward
Controlling Decentralized Pervasive Systems” authored by Yusuke Iwase, Reiji Suzuki and Takaya Arita
bring us back to where we begin in this introductory chapter, namely, cellular automata. As we have
noticed, from a design perspective, the fundamental issue is an inverse engineering problem, i.e., to
find out rules of automata by which our desired patterns can emerge. This chapter basically deals with
this kind of issue but in an environment different from the conventional cellular automata. The cellular
automata are normally run in a closed system. In this chapter, the authors consider an interesting exten-
sion by exposing them to an open environment or a pervasive system. In this case, each automaton will
receive external perturbations probabilistically. These perturbations will then change the operating rules
xxv
of the interfered cells, which in turn may have global effects. Having anticipated these properties, the
authors then use genetic algorithms to search for rules that may best work with these perturbations to
achieve a given task.
The issues and simulations presented in this chapter can have applications to social dynamics. For
example, citizens interact with each other in a relatively closed system, but each citizen may travel out
once in a while. When they return, their behavioral rules may change due to cultural exchange; hence
they will have an effect on their neighbors that may even have a global impact on the social dynamics.
In this vein, the other city which hosts these guests may experience similar kinds of changes. In this
way, the two systems (cities) are coupled together. People in cultural studies may be inspired by the
simulation presented in this chapter.
7. CONCLUDING REMARKS
When computer science becomes more social and the social sciences become more computational,
publications that can facilitate the talks between the two disciplines are demanded. This edited volume
demonstrates our efforts to work this out. It is our hope that more books or edited volumes as joint ef-
forts among computer scientists and social scientists will come, and, eventually, computer science will
help social scientists to piece together their “fragmental” social sciences, and the social sciences will
constantly provide computer scientists with fresh inspiration in defining and forming their new and
creative research paradigm. The dialogue between artificial automata and natural automata will then
continue and thrive.
Shu-Heng Chen
National Chengchi University, Taiwan
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Hiroshi Sato
National Defense Academy, Japan
REFERENCES
Albin, P. (1975). The Analysis of Complex Socioeconomic Systems. Lexington, MA: Lexington Books.
Arai, K., Deguchi, H., & Matsui, H. (2006). Agent-Based Modeling Meets Gaming Simulation. Springer.
Arthur, B. (1994). Inductive reasoning and bounded rationality. American Economic Review, 84(2),
406–411.
Bush, R.R., & Mosteller, F. (1955). Stochastic Models for Learning. New York: John Wiley & Sons.
Hayek, F. (1945). The use of knowledge in society. American Economic Review, 35(4), 519-530.
xxvi
Chen S.-H, & Liao C.-C. (2004). Behavior finance and agent-based computational finance: Toward an
integrating framework. Journal of Management and Economics, 8, 2004.
Chen S.-H, Chang C.-L, & Du Y.-R (in press). Agent-based economic models and econometrics. Knowl-
edge Engineering Review, forthcoming.
Clement, R. (1989). Combining forecasts: A review and annotated bibliography. International Journal
of Forecasting, 5, 559-583.
Cockayne, W., Zyda, M. (1998). Mobile Agents. Prentice Hall.
De Garis, H. (2008). Artificial brains: An evolved neural net module approach. In J. Fulcher & L. Jain
(Eds.), Computational Intelligence: A Compendium. Springer.
Dow, N. (2003). Reinforcement Learning Models of the Dopamine System and Their Behavior Implica-
tions. Doctoral Dissertation. Carnegie Mellon University.
Grosan, C., & Abraham, A. (2006) Stigmergic optimization: Inspiration, technologies and perspectives.
In A. Abraham, C. Gorsan, & V. Ramos (Eds.), Stigmergic Optimization (pp. 1-24). Springer.
Gode, D., & Sunder, S. (1993). Allocative efficiency of markets with zero intelligence traders: Market
as a partial substitute for individual rationality. Journal of Political Economy, 101,119-137.
Kitano, H. (Ed.) (1998) RoboCup-97: Robot Soccer World Cup I. Springer.
Kooths, S., Mitze, T., & Ringhut, E. (2004). Forecasting the EMU inflation rate: Linear econometric
versus non-linear computational models using genetic neural fuzzy systems. Advances in Econometrics,
19, 145-173.
Luce, D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley.
Minsky, M. (1988). Society of Minds. Simon & Schuster.
Montague, R. (2007). Your Brain Is (Almost) Perfect: How We Make Decisions. Plume.
Mumford, C., & Jain, L. (2009). Computational Intelligence: Collaboration, Fusion and Emergence.
Springer.
Reynolds, R. (1994). An introduction to cultural algorithms. In Proceedings of the 3rd Annual Confer-
ence on Evolutionary Programming (pp. 131-139). World Scientific Publishing.
Reynolds, R. (1999). An overview of cultural algorithms. In D. Corne, F. Glover, M. Dorigo (Eds.), New
Ideas in Optimization (pp. 367-378). McGraw Hill Press.
Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal, 3(3),
210-229.
Sasaki, Y., Flann, N., Box, P. (2005). The multi-agent games by reinforcement learning applied to on-line
optimization of traffic policy. In S.-H. Chen, L. Jain & C.-C. Tai (Eds.), Computational Economics: A
Perspective from Computational Intelligence (pp. 161-176). Idea Group Publishing.
xxvii
Acknowledgment
The editors would like to acknowledge the assistance of all involved in the collection and review process
of this book, without those support the project could not have been completed. We wish to thank all the
authors for their great insights and excellent contributions to this book. Thanks to the publishing team
at IGI Global, for their constant support throughout the whole process. In particular, special thanks to
Julia Mosemann for her patience in taking this project to fruition.
Shu-Heng Chen
National Chengchi University, Taiwan
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Hiroshi Sato
National Defense Academy, Japan
Section 1
Multi-Agent Financial Decision
Systems
1
Chapter 1
A Multi-Agent System
Forecast of the S&P/Case-
Shiller LA Home Price Index
Mak Kaboudan
University of Redlands, USA
ABSTRACT
Successful decision-making by home-owners, lending institutions, and real estate developers among
others is dependent on obtaining reasonable forecasts of residential home prices. For decades, home-
price forecasts were produced by agents utilizing academically well-established statistical models. In this
chapter, several modeling agents will compete and cooperate to produce a single forecast. A cooperative
multi-agent system (MAS) is developed and used to obtain monthly forecasts (April 2008 through March
2010) of the S&P/Case-Shiller home price index for Los Angeles, CA (LXXR). Monthly housing market
demand and supply variables including conventional 30-year fixed real mortgage rate, real personal
income, cash out loans, homes for sale, change in housing inventory, and construction material price
index are used to find different independent models that explain percentage change in LXXR. An agent
then combines the forecasts obtained from the different models to obtain a final prediction.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
housing market. For example, Ludwig and Torsten Chicago, Denver, Las Vegas, Los Angeles, Miami,
(2001) quantify the impact of changes in home New York, San Diego, San Francisco, and Wash-
prices on consumption in 16 OECD countries, and ington, D.C. as well as a composite index of all 10
in Australia the government expected “moderat- cities (CME, 2007). A second composite index was
ing consumption growth as wealth effects from later introduced to include twenty metropolitan
house price and share price movements stabilise” areas. Additionally, it is calculated for Atlanta,
(Commonwealth of Australia, 2001). Changes in Charlotte, Cleveland, Dallas, Detroit, Minne-
home prices in a relatively large economy (such as apolis, Phoenix, Portland, Seattle, and Tampa as
that of the U.S.) also affect economic conditions well as a composite index of all 20 MSAs. These
in others. For example, Haji (2007) discussed the are financial tools to trade U.S. real estate values
impact of the U.S. subprime mortgage crisis on the and are based on the S&P/Case-Shiller Indexes
global financial markets. Reports in the Chinese (CSIs) for all 20 cities and the two composites.
news (e.g., China Bystanders, 2008) reveal that CSIs are recognized as “the most trustworthy and
banks in China are experiencing lower profits due authoritative house price change measures avail-
to losses on trading mortgage-related securities. able” (Iacono, 2008). Case and Shiller (1989 and
On August 30, 2007, a World Economy report 1990) presented early introduction of the index,
published by the Economist stated that “subprime its importance, and forecasts.
losses are popping up from Canada to China”. On This chapter focuses on forecasting the S&P/
May 16, 2008, CNN Money.com (2008) published Case-Shiller index for Los Angeles MSA (LXXR).
a summary report of the mid-year update of the Two reasons explain why LXXR is selected.
U.N. World Economic Situation and Prospects First, modeling and predicting only one index is a
2008. The U.N. report stated that the world challenge to be addressed first before tackling 20
economy is expected to grow only 1.8% in 2008 indexes in different locations that are characteristi-
and the downturn is expected to continue with cally heterogeneous markets and before predicting
only a slightly higher growth of 2.1% in 2009. either composite. Second, the Los Angeles area
The slow growth is blamed on further deteriora- housing market is one of the hardest hit by the
tion in the U.S. housing and financial sectors that unprecedented subprime financial problems. The
is expected to “continue to be a major drag for plan is to predict monthly percentage changes in
the world economy extending into 2009.” Given LXXR for 24 months (April of 2008 through March
the economic impacts of changes in home prices, of 2010). Monthly percentage change in LXXR =
accurate predictions of future changes probably %D_LXXRt = 100*{Ln(LXXRt)-Ln(LXXRt-1)},
help project economic conditions better. where Ln = natural logarithm, and t = 1, …,
Since April 2006, forecasting home prices T months. (Hereon, %D_Xt = 100*{Ln(Xt)-
gained additional importance and therefore at- Ln(Xt-1)}.) Input data used is monthly and covers
tention after the Chicago Mercantile Exchange the period from January of 1992 through March
(CME) began trading in futures and options on of 2008. The forecast is developed in stages. In
housing. Investors can trade the CME Housing the first, variables (Xi where i = 1, …,n) sus-
futures contracts to profit in up or down housing pected of creating temporal variations in LXXR
markets or to protect themselves against market are logically and intuitively identified, data of
price fluctuations, CME Group (2007). Initially, those variables are collected, then variables that
prices of those contracts were determined ac- best explain variation in LXXR (Xj where j = 1,
cording to indexes of median home prices in ten …,k and k ⊆ n) are determined using genetic
metropolitan statistical areas (MSAs): Boston, programming (GP). Variables identified as best
2
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
in the first stage are forecasted for 24 months in The balance of this chapter has four sections.
the second. A multi-agent system (MAS) is finally Before describing the methodology used to pro-
developed to model the monthly percent change in duce a forecast of %D_LXXR using multi-agent
LXXR (%D_LXXR). In this third stage, MAS is systems, the S&P/Case-Shiller Index is briefly
a network of computational techniques employed introduced. Estimation results are presented next
first to obtain several independent “best” fore- followed by the forecasts obtained. The final sec-
casts of %D_LXXR. By assuming that each of tion has the conclusion.
the techniques employed captures the variable’s
dynamics over history (1992-2008) at least par-
tially, a single agent then independently takes the ThE S&P/CASE-ShILLER INDEx
forecasts produced (by the techniques employed)
as input to produce a single forecast as output. The Case/Shiller indexes are designed to measure
Ideally, the best forecast should be evaluated changes in the market value of residential real
relative to others published in the literature. How- estate in each Metropolitan Statistical Area (MSA)
ever, an extensive literature search failed to find by tracking the values of single-family housing
any monthly forecast of LXXR or %D_LXXR. within the United States. It measures changes in
(Most probably this is the first study to model housing prices with homes sold held at a constant
and predict LXXR monthly.) Only independent level of quality by utilizing data of matched sale
annual changes expected in CA were found. For pairs for pre-existing homes. In short, its calcula-
example, the California Association of Realtors tion is based upon repeat sales of existing homes.
(C.A.R.) publishes an annual forecast for the en- For each MSA, a three-month moving average
tire state. Their quarterly forecast (C.A.R., 2008) is calculated. The monthly moving average is of
projects a modest price increase in the second sales pairs found for that month and the preceding
half of 2008 and in 2009. Their annual forecast two months. A Standard & Poor’s report (2008a)
(Appleton-Young, 2008) projects a decline of contains a full description of how the indexes are
about 5% for 2008. A different forecast is produced calculated. The indexes are published by Standard
by Housing Predictor (2008) who predicts that & Poor’s (2008b). Without going into details here,
home prices will decline by 12.8% in 2008. Only LXXR is designed to measure changes in the total
two forecasts of the S&P/Case-Shiller indexes value of all existing single-family housing stock
were found: Moody’s Economy.com (2008) and in the Los Angeles Metropolitan Statistical Area
Stark (2008). Moody’s forecast is of the 20-city which includes Los Angeles, Long Beach, and
composite index and is only presented graphically. Santa Ana.
It suggested that housing prices will continue to The Los Angeles MSA index (LXXR) depicts
decline through the middle of 2009 when they the sharp declines in housing prices experienced
are expected to start bottoming out. Stark (2008) 2007 and 2008. Changes in housing prices around
predicted that the composite index will decline the Los Angeles area were much stronger than
by 12% in 2008, decline by 0.3% in 2009, and most of the nation. Figure 1 shows a comparison
increase by 3.8% in 2010. Only one city forecast between the monthly percentage changes in LXXR
of CSI was found. The BostonBubble.com (2007) and the 10- and the 20-composite indexes (CSXR
published a forecast of the index for Boston MSA and SPCS20R) over the period 1992-2008. The
through 2011. They project that the Boston CSI more aggressive volatility in %D_LXXR (higher
will continue to decline until April of 2010. % increases and more pronounced % decreases)
relative to the two composite indexes is evident.
3
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 1.
4
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
ANN, and an OLS to obtain future (predicted) maximum tree depth = 100, maximum number of
values of %D_LXXR. generations = 100, mutation rate = 0.6, crossover
Use of multi-agent systems when developing rate = 0.3, self reproduction = 0.10, and operators
models that explain dynamics of systems is not = +, -, x, /, sin, & cos.
new. Chen and Yeh (1996) used GP learning of GP-evolved equations are in the form of a parse
the cobweb model. Barto (1996) used ANN multi- tree. Trees are randomly assembled such that if
agent reinforcement learning. Chen and Tokinaga an operator is selected, the tree grows. Operators
(2006) used GP for pattern-learning. Vanstone are thus its inner nodes. A tree continues to grow
and Finnie (2007) used ANN for developing until end nodes (or terminal) contain variables or
stock market trading system. As mentioned in constant terms. Once a population of equation is
the introduction, modeling %D_LXXR to fore- assembled, a new generation is then bred using
cast LXXR is completed in three stages. Before mutation, crossover, and self reproduction. Fitter
describing each stage, brief introductions to GP equations in a population get a higher chance to
and ANN follow. participate in breeding. In mutation, a randomly
assembled sub-tree replaces a randomly selected
Genetic Programming existing part of a tree. In crossover, randomly
selected parts of two existing trees are swapped.
GP is an optimization search technique. Koza In self reproduction, a top percentage of the fittest
(1992) provides foundations of GP. Examples of individuals in one population (usually top 10%)
its use in forecasting are in Chen and Yeh (1995), are passed on to the next generation. For all bred
Tsang et al. (1998), and Warren (1994). The GP individuals, if the offspring are fitter than their
software used in this study is TSGP (for Time parents, they survive; else the parents survive.
Series Genetic Programming) written in C++ for The idea in GP is to continue generating new
Windows environment and runs on a standard populations while preserving “good genes” in a
PC (Kaboudan, 2004). TSGP is used because it Darwinian sense. After completing a specified
is designed specifically to produce regression- number of generations (100 to 200), the program
type models and to compute standard regression terminates and saves the fittest model to an output
statistics. Statistical properties of models TSGP file. Actual, fitted, forecasted values, residuals,
produces were analyzed in Kaboudan (2001). as well as evaluation statistics (R2, MSE, and
Two types of input files are needed for executing MAPE = mean absolute percent error) are written
TSGP: data files and a configuration file. Data to another output file.
input files contain values of the dependent and A GP algorithm has its characteristics. The
each of the independent variables. The configu- program randomly selects the explanatory vari-
ration file contains execution information a user ables and the coefficients. The iterative process
controls. TSGP assembles an initial population produces coincidental very strange specifications.
of individual equations (say 1000 of them) with It is based on heuristics and lacks theoretical
random specifications, computes their fitness justification. Further, during execution the com-
(MSE = mean squared error), and then breeds puterized algorithm occasionally gets trapped at
new equations as members of a new generation a local minimum MSE in the search space and
with the same population size. Each individual never reaches a global one. This necessitates
– member of a population – is a regression-like conducting a large number of searches (say 100)
model represented by a parse tree. The key run to find the 100 fittest equations. One or more of
parameters specified in the configuration file are: them should actually produce a superior fit (and
population size =1000, fitness measure = MSE, forecast) that may not be otherwise obtainable.
5
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Perhaps this explains why GP-evolved equations number of epochs is increased by increments of
have strong predictive abilities. 500 until the best network is identified. The search
GP and conventional statistical regressions is repeated using networks with two hidden layers
have differences. Because equation coefficients if no reasonably acceptable output is obtained. The
are not computed (they are computer-generated configuration with the best estimation statistics is
random numbers between -128 and 127 for TSGP), then used in forecasting. The fitness parameter is
problems of multicollinearity, autocorrelation, and MSE to be consistent with GP.
heteroscedasticity are nonexistent. Further, there
are also no degrees of freedom lost when more Stage I: Determining the
explanatory variables are added. Explanatory Variables
6
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
and mortgage rate and supply determinants such forecasting model. Only those variables occurring
as construction cost and changes in inventory for often in the generated GP equations are reported
example in the equation. here (given that there is no obvious benefit from
The identified complete set of explanatory discussing others). The GP agent determined
variables are then Xi for i = 1, …, n possible that variations in %D_LXXR are best explained
demand and supply determinants as well as their by: COL = cash out loans or percent of amount
lagged values. Lagged explanatory variables are borrowed above the amount used to finance a
logical since the decision making process when purchase (Freddie Mac, 2008a). FS = number
buying a home tends to be rational and conditional of houses for sale in thousands (U.S. Census
upon verification of income, inspection of homes, Bureau, 2008a). SOLD = number of units sold
among other transaction-completing progression in thousands (U.S. Census Bureau, 2008a). ES =
routine that can take anywhere from three months excess supply = FSt-1 – SOLDt. CMI = construc-
to a year. To determine Xj for j = 1, …, k (that tion material index (U.S. Census Bureau, 2008b).
subset of Xi) variables to select when modeling CHI = change in housing inventory = FSt – FSt.
changes in LXXR, a single agent is employed. MR = 30-year real mortgage rate (Freddie Mac,
It is a GP regression-like model-assembler. That 2008b). LOAN = indexed loan = LXXR * LPR
agent is given all possible variables identified as (LPR = loan-to-price ratio). LAPI = Los Angeles
candidates to explain variation in %D_LXXR. real personal income (U.S. Bureau of Economic
Its task is to generate a number of promising Analysis, 2008). ESDV = excess supply dummy
models. Thus, given Xi, a GP agent will evolve a variable where ESDVt = 1 if ESt < average ESt
reasonably good number of equations (say 200) and = zero otherwise. These variables are taken at
first. The fittest of those equations (say best 10%) different lags λ = 3, …, 24 months: COLt-λ, FSt-λ,
are identified. Explanatory variables included in SOLDt-λ, ESt- λ, CMIt-λ, CHIt-λ, MRt- λ, LOANt-λ, and
these equations are then harvested to be used in LAPIt-λ. All variables measured in dollar values
re-estimating models of %D_LXXR employing were converted into 1982 real or constant dollars
GP as well as other agents. using the LA metropolitan area consumer price
The idea of selecting the best explanatory index (CPI).
variables using GP is new. The process starts with
all the identified variables and their lagged values Stage II: Predicting the
(with λ = 3, 13, …, 24 monthly lags considered Explanatory Variables
and included). The GP agent then generates models
to explain historical variations in the dependent In this stage, the forecast values of each explana-
variable %D_LXXR. Specifications of each of the tory variable (Xj) determined in the first stage are
20 best evolved models (10% of all GP models obtained employing two agents, GP and ANN.
evolved) are obtained and whatever variables they Agents responsible for obtaining forecasts of the
contain are tabulated. Any variable in an equation Xj are each given a set of appropriate variables
gets one vote regardless of the number of times (Zv) to explain variations in each previously
it appears in that same equation. This means that identified X variable. Alternatively, Xj = f(Zvλ),
the number of votes a variable gets is the number where Zvλ for v = 1, …, V explanatory variables
of GP models that it appears in, and therefore, the and λ = 3, .., L lags. Results from the two agents
maximum number of votes a variable can have are compared and a decision is made on whether
is 20. The votes are tallied and those variables to take the better one or take their average. The
repeatedly appearing in the equations are selected two agents are assumed competitive if the results
to employ when searching for the final %D_LXXR of one of them are deemed superior to the other.
7
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 2.
8
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 3. Flow chart starting with X1 explanitory input variables into each modeling technique. Solu-
tions from the different models are identified by ‘S’ (GP S, ANN S, ..., etc.). Respective MSE computa-
tions determine the best model each technique produces. ANN is then used to estimate models that fit
residuals output from GP and from RM.
dynamics embedded in each variable’s historical dictions. The three techniques are GP, ANN, and
values. Two new variables were introduced in the linear regression models (RM or OLS). All three
equations above: LPR and HWR. HWR is the are multivariate and they take exogenous variables
average monthly hourly wage in Los Angeles. as inputs. As shown in the figure, the Xj variables
Estimation statistics belonging to the models feed into the model generating techniques. Each
that generated forecasts of the seven explanatory technique acts as an agent whose task is to pro-
variables (%D_COL, %D_MR, FS, %D_SOLD, duce many solutions. The best solutions provided
CMI, %D_LOAN, %D_LAPI) are in Table 1. To by the techniques then compete to determine the
produce lengthy forecasts, Equations (1) through best forecast.
(7) were used to forecast three months (the least Solutions obtained using the different tech-
lag in an equation) first. The three-month forecast niques remain competitive until all best solutions
was then used as input to solve each respective are identified. They act as cooperative agents to
equation again to obtain the next three months help deliver the best possible forecast in the final
forecast, and so on. Using this iterative process, step. Cooperation is in two ways. The first involves
a 24-month forecast of each of the explanatory fitting the residuals (= Actual – fitted) from one
variables was produced. technique using a different technique. Because
ANN produced the lowest MSE relative to GP
Stage III: Multi-Agent Modeling of %D_ and RM at the competitive level, ANN was used
LXXR to fit residuals the other two produced. The idea
assumes that whatever dynamics one technique
In this stage, several agents are employed to missed (i.e., the residuals) may be captured using
produce forecasts of %D_LXXR and LXXR. a different technique. The second cooperation
Figure 3 portrays the flow of the implementation involves using all outputs obtained from the dif-
process. Historical as well as forecasted values ferent techniques as inputs to model some type
of the explanatory variables are input to three of weight distribution between them and hope-
techniques selected to produce %D_LXXR pre- fully capture what may be the best forecast.
9
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 4.
10
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
ESTIMATION RESULTS was added to the list the final GP model contained.
The best OLS equation found (with R2 = 0.75 and
Only the final estimation results and their statistics MSE = 0.36) was as follows:
are presented here. Forecasts are presented in the
next section. Using the same input variables, an Y = 7.49 - 39.41 X4t-6 - 0.74 X3t-5 + 0.55 X2t-11
extensive search is employed until the best pos- + 3.40 Ln(X5 t-9) - 3.87 X5t-12
sible output statistics are reached independently
by each technique. ANN does not produce an (0.75) (14.3) (0.26) (0.18) (0.93) (0.77)
estimated equation and only the best GP and
RM-OLS estimated equations can be reported. - 1.47 X1t-1 + 0.42 X1 t-6. (9)
A search consisting of a total of 100 GP mod-
els was conducted. The best (lowest MSE) GP (0.17) (0.22)
equation among the 100 attempted produced the
fit shown in Figure 4 (a) and was as follows: In (9), the figures in parentheses are the es-
timated coefficients’ standard errors. These sta-
Y = cos{sin[X1t-3 * {cos(X1t-6) + X1t-3}] - X1t-6} tistics suggest that all estimated coefficients are
*[ sin[ sin(X2t-5) + {X3t-9 * sin(X2t-5 + X1t-3)} statistically different from zero at the 5% level of
] + { X3t-9 * [ {sin(X2t-5) + DVt + X3t-9} + { significance. Figure 4 (c) compares actual with
sin[X2t-5 + X1t-3 * { cos(X1t-6) + X1t-3} ] – OLS fitted values.
sin{X1t-3* [cos{sin(X2t-5) + DVt} + X1t-3] } + The results from the three techniques used sug-
X1t-6} ] }] - DVt - 2 * X4t (8) gest that ANN (with lowest MSE) should generate
the best forecast. Logically then, GP and OLS re-
where (and for aesthetic reasons), Y = %D_LXXR, siduals (Actual – Fitted) may be predictable using
X1 = CHI, X2 = %D_SOLD, X3 = COL, X4 = ANN. Thus, to capture the GP and OLS models
%D_MR, and ESDV = DV. The equation above unexplained variation in %D_LXXR, ANN was
had R2 = 0.79 and MSE = 0.298. used to model them. The resulting fitted residuals
As mentioned earlier, the ANN search in- were then added to originally obtained %D_LXXR
volved trial-and-error routine to find the best fit. fit to produce the new cooperative two-agent fits
Ultimately, the best model was produced by a and forecasts. Two ANN systems were developed
multilayered perceptron system with a layered using the exact same explanatory variables with
feedforward network trained with static back- the dependent variable being GP residuals once
propagation. The best results were obtained using and OLS residuals the second. Here are the results
a single hidden layer with five input processing from estimating the two combinations performed:
elements, a TanhAxon transfer function, used ANN estimation of the GP residuals to obtain
momentum learning rule (with step size = 1.0 GP+ANN: R2 = 0.97 MSE = 0.008
and momentum = 0.7), with 6500 epochs, and ANN estimation of the OLS residuals to obtain
was obtained after three training times. The best OLS+ANN: R2 = 0.95 MSE = 0.02
ANN fit obtained is shown in Figure 4 (b). It had Additional models were then obtained by
R2 = 0.99 and MSE = 0.006. reconciliatory cooperation. Reconciliatory coop-
A trial-and-error procedure was also used to eration entails employing all results obtained thus
find the best OLS regression model. Explanatory far as input variables to produce the final fits and
variables were added and deleted and their lags forecasts. Alternatively, three agents (GP, ANN,
varied until the best fit was found. Interestingly, and OLS) take as inputs the five solutions produced
only one extra explanatory variable (X5 = LOAN) using GP, ANN, OLS, GP+ANN, and OLS+ANN
11
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
modeling algorithms to produce new models and when other variables were selected. The OLS-
estimates. The final best GP- reconciliation model reconciliation model (OLS_R) also produced the
(GP_R) found is: best outcome when the same two variables were
employed. The best OLS_R model found is:
%D_LXXR = 0.4194 GPNN + 0.5806 NN
(10) %D_LXXR = 0.011 + 0.463 GPNN + 0.543
NN. (11)
12
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
0.008, and their average MSE = 0.005. All four prices start to increase marginally in 2009, but
had R2 = 0.99. decline again early in 2010. Prices are expected
to decrease by 7.5% in the third quarter of 2008
and decrease again by 5.56% in the fourth quarter.
FORECAST RESULTS They are expected to increase by about 3% in third
quarter of 2009, but decrease by 3.24% during the
Forecast results and their statistics are compared first quarter of 2010.
in this section. The three reconciliatory attempts Table 3 presents forecasted LXXR values
produced similar forecasts. Table 2 shows the obtained using the %D_LXXR predictions where
predicted monthly %D_LXXR produced by all the predicted LXXR values are computed as fol-
agents and the average produced by the final lows:
three reconciliation agents. Although the rate of
decline in prices is expected to decrease in 2008,
13
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 5.
LXXRt = exp(%D_LXXRt/100 + Ln(LXXRt-1)) of housing price indexes in the Los Angeles met-
(12) ropolitan area throughout March of 2010 shown
in Tables 2 and 3 are rather gloomy. Prices are
Given that the results of the agents are almost expected to reverse to their levels during the third
identical, Figure 5 presents the plots of predicted quarter of 2003 by the first quarter of 2010.
%D_LXXR and LXXR averages only. Predictions
14
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Figure 6.
15
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
strong similarity between them. Given the strong Case, K., & Shiller, R. (1990). Forecasting prices
similarity between them, it is easy to conclude and excess returns in the housing market. American
that the Los Angeles metropolitan area housing Real Estate and Urban Economics Association
market will remain depressed until the end of the Journal, 18, 263–273. doi:.doi:10.1111/1540-
forecast period considered in this research. Prices 6229.00521
will stabilize in 2009 but resume their decline
Case, K., & Shiller, R. (2003). Is there a bubble
early in 2010.
in the housing market? Brookings Papers on
Economic Activity, 1, 299–342. doi:.doi:10.1353/
eca.2004.0004
REFERENCES
Chen, S., & Yeh, C. (1995). Predicting stock re-
Appleton-Young, L. (2008). 2008 real estate turns with genetic programming: Do the short-run
market forecast. California Association of Re- nonlinear regularities exist? In D. Fisher (Ed.),
altors. Retrieved December 2008, from http:// Proceedings of the Fifth International Workshop
bayareahousingreview.com/wp-content/up- on Artificial Intelligence and Statistics (pp. 95-
loads/2008/02/ leslie_appleton_young _preso 101). Ft. Lauderdale, FL.
_read-only1.pdf.
Chen, S., & Yeh, C. (1996). Genetic programming
Barto, A. (1996). Muti-agent reinforcement learn- learning and the cobweb model . In Angeline, P.
ing and adaptive neural networks. Retrieved De- (Ed.), Advances in Genetic Programming (Vol.
cember 2008, from http://stinet.dtic.mil/cgi-bin/ 2, pp. 443–466). Cambridge, MA: MIT Press.
GetTRDoc?AD=ADA315266&Location=U2&d
oc =GetTRDoc.pdf. Chen, X., & Tokinaga, S. (2006). Analysis of price
fluctuation in double auction markets consisting
Bostonbubble.com. (2007). S&P/Case-Shiller of multi-agents using the genetic programming for
Boston snapshot Q3 2007. Retrieved December learning. Retrieved from https://qir.kyushuu.ac.jp/
2008, from http://www.bostonbubble.com/fo- dspace/bitstream /2324/8706/ 1/ p147-167.pdf.
rums/viewtopic.php?t=598.
China Bystanders. (2008). Bank profits trimmed
C.A.R. (2008). U.S. economic outlook: 2008. by subprime losses. Retrieved from http://chinaby-
Retrieved December 2008, from http://rodomino. stander. wordpress.com /2008/03/25/bank-profits-
realtor.org/Research.nsf/files/ currentforecast. trimmed-by-subprime-losses/.
pdf/$FILE/currentforecast.pdf.
CME 2007. (n.d.). Retrieved December 2008,
Case, K., Glaeser, E., & Parker, J. (2000). Real es- from http://www.cme.com/trading/prd/re/hous-
tate and the macroeconomy. Brookings Papers on ing.html.
Economic Activity, 2, 119–162. doi:.doi:10.1353/
eca.2000.0011 Commonweal of Australia. (2001). Economic
Outlook. Retrieved December 2008, from http://
Case, K., & Shiller, R. (1989). The efficiency of www.budget.gov.au/2000-01/papers/ bp1/html/
the market for single-family homes. The American bs2.htm.
Economic Review, 79, 125–137.
Economist.com. (2007). The world economy:
Rocky terrain ahead. Retrieved December 2008,
from http://www.economist.com/ daily/news/
displaystory.cfm?storyid=9725432&top_story=1.
16
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Fair, R., & Jaffee, D. (1972). Methods of estima- Koza, J. (1992). Genetic programming. Cam-
tion for markets in disequilibrium. Econometrica, bridge, MA: The MIT Press.
40, 497–514. doi:.doi:10.2307/1913181
Ludwig, A., & Torsten, S. (2001). The impact of
Freddie Mac. (2008a). CMHPI data. Retrieved stock prices and house prices on consumption in
December 2008, from http://www.freddiemac. OECD countries. Retrieved December 2008, from
com/finance/ cmhpi/#old. http://www.vwl.uni-mannheim.de/brownbag/
ludwig.pdf.
Freddie Mac. (2008b). 30-year fixed rate histori-
cal Tables. Historical PMMS® Data. Retrieved Money, C. N. N. com (2008). World economy
December 2008, from http://www.freddiemac. on thin ice - U.N.: The United Nations blames
com/pmms/pmms30.htm. dire situation on the decline of the U.S. housing
and financial sectors. Retrieved December 2008,
Group, C. M. E. (2007). S&P/Case-Shiller Price
from http://money.cnn.com/2008/05 /15/news/
Index: Futures and options. Retrieved December
international/global_economy.ap/.
2008, from http://housingderivatives. typepad.
com/housing_derivatives/files/cme_housing Moody’s. Economy.com (2008). Case-Shiller®
_fact_sheet.pdf. Home Price Index forecasts. Moody’s Analytics,
Inc. Retrieved December 2008, from http://www.
Haji, K. (2007). Subprime mortgage crisis casts a
economy.com/home/products/case_shiller_in-
global shadow – medium-term economic forecast
dexes.asp.
(FY 2007~2017). Retrieved December 2008, from
http://www.nli-research.co.jp/english/econom- National Association of Home Builders, The Hous-
ics/2007/ eco071228.pdf. ing Policy Department. (2005). The local impact
of home building in a typical metropolitan area:
Housing Predictor. (2008). Independent real
Income, jobs, and taxes generated. Retrieved De-
estate housing forecast. Retrieved December
cember 2008, from http://www.nahb.org/fileUp-
2008, from http://www.housingpredictor.com/
load_details.aspx?contentTypeID=3&contentID=
california.html.
35601& subContentID=28002.
Iacono, T. (2008). Case-Shiller® Home Price
NeuroSolutionsTM (2002). The Neural Network
Index forecasts: Exclusive house-price forecasts
Simulation Environment. Version 3, NeuroDimen-
based on Fiserv’s leading Case-Shiller Home
sions, Inc., Gainesville, FL.
Price Indexes. Retrieved December 2008, from
http://www.economy.com/home/products/ case_ Principe, J., Euliano, N., & Lefebvre, C. (2000).
shiller_indexes.asp. Neural and Adaptive Systems: Fundamentals
through Simulations. New York: John Wiley &
Kaboudan, M. (2001). Genetically evolved mod-
Sons, Inc.
els and normality of their residuals. Journal of
Economic Dynamics & Control, 25, 1719–1749. Standard & Poor’s. (2008a). S&P/Case-Shiller®
doi:.doi:10.1016/S0165-1889(00)00004-X Home Price Indices Methodology. Standard &
Poor’s. Retrieved December 2008, from http://
Kaboudan, M. (2004). TSGP: A time series ge-
www2.standardandpoors.com/spf/pdf/index/
netic programming software. Retrieved December
SP_CS_Home_ Price_Indices_ Methodology_
2008, from http://bulldog2.redlands.edu/ fac/
Web.pdf.
mak_kaboudan/tsgp.
17
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
Standard & Poor’s. (2008b). S&P/Case-Shiller U.S. Census Bureau. (2008a). Housing vacan-
Home Price Indices. Retrieved December 2008, cies and home ownership. Retrieved December
from http://www2.standardandpoors.com/ por- 2008, from http://www.census.gov/hhes/ www/
tal/site/sp/en/us/page.topic/indices_csmahp/ histt10.html.
2,3,4,0,0,0,0,0,0,1,1,0,0,0,0,0.html.
U.S. Census Bureau. (2008b). New residential
Stark, T. (2008). Survey of professional forecast- construction. Retrieved December 2008, from
ers: May 13, 2008. Federal Reserve Bank of Phila- http://www.census.gov/const/www/newrescon-
delphia. Retrieved December 2008, from http:// stindex_excel.html.
www.philadelphiafed.org/files/spf/survq208.html
Vanstone, B., & Finnie, G. (2007). An empirical
Tsang, E., Li, J., & Butler, J. (1998). EDDIE methodology for developing stockmarket trad-
beats the bookies. Int. J. Software. Practice ing systems using artificial neural networks.
and Experience, 28, 1033–1043. doi:10.1002/ Retrieved December 2008, from http://epub-
(SICI)1097-024X(199808)28:10<1033::AID- lications.bond.edu.au/cgi/ viewcontent.cgi?
SPE198>3.0.CO;2-1 article=1022&context=infotech_pubs.
U.S. Bureau of Economic Analysis. (2008). Re- Warren, M. (1994). Stock price prediction using
gional economic accounts: State personal income. genetic programming . In Koza, J. (Ed.), Genetic
Retrieved December 2008, from http://www.bea. Algorithms at Stanford 1994. Stanford, CA: Stan-
gov/regional/sqpi/default.cfm?sqtable=SQ1. ford Bookstore.
18
19
Chapter 2
An Agent-Based Model for
Portfolio Optimization Using
Search Space Splitting
Yukiko Orito
Hiroshima University, Japan
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Yasuhiro Tsujimura
Nippon Institute of Technology, Japan
Hisashi Yamamoto
Tokyo Metropolitan University, Japan
ABSTRACT
Portfolio optimization is the determination of the weights of assets to be included in a portfolio in order
to achieve the investment objective. It can be viewed as a tight combinatorial optimization problem that
has many solutions near the optimal solution in a narrow solution space. In order to solve such a tight
problem, we introduce an Agent-based Model in this chapter. We continue to employ the Information
Ratio, a well-known measure of the performance of actively managed portfolios, as an objective func-
tion. Our agent has one portfolio, the Information Ratio and its character as a set of properties. The
evolution of agent properties splits the search space into a lot of small spaces. In a population of one
small space, there is one leader agent and several follower agents. As the processing of the populations
progresses, the agent properties change by the interaction between the leader and the follower, and when
the iteration is over, we obtain one leader who has the highest Information Ratio.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
by a portfolio. When we attempt to solve such an of assets have similar performance values. The
optimization problem, we usually find candidates second problem is that there are many solutions
for the solutions that are better than others. The near the optimal solution. It is hard to solve such
space of all feasible solutions is called the search a tight optimization problem even with strong
space. There are a number of possible solutions evolutionary algorithms.
in the search space and finding the best solution In this chapter, we propose an Agent-based
is thus equal to finding some extreme values, Model in order to solve this tight optimization
minimum or maximum. In the search space, we problem. In general, agent-based models describe
want to find the best solution, but it is hard to solve interactions and dynamics of a group of trad-
in reasonable time as the number of assets or the ers in the artificial financial market (LeBaron,
number of weights of each asset grows. Because 2000). Our Agent-based Model is implemented
there are many possible solutions in the search as a global and local search method for the port-
space, it is usually hard for us to know where to folio optimization problem. Our agent has a set
find a solution or where to start. In order to solve of properties: its own portfolio, a performance
such a problem, many researchers use methods value obtained by the portfolio and its character.
based on evolutional algorithms: for example, In the starting population, there is one leader
genetic algorithm (GA), simulated annealing, tabu agent, and there are many follower agents. The
search, some local searches and so on. follower agents are categorized into three groups,
There are two investment objectives for portfo- namely obedient group, disobedient group, and
lio management: active management and passive indifferent group. In the first group, the followers
management. Active management is an investment obediently follow the leader’s behaviors. In the
strategy that seeks returns in excess of a given second group, the followers are disobedient and
benchmark index. Passive management is an in- adopt behaviors opposite to that of the leader.
vestment strategy that mirrors a given benchmark In the third group, the followers determine their
index. Thus, if you believe that it is possible to behaviors quite independently. As processing of
outperform the market, you should invest in an the population proceeds through search space
active portfolio. The Information Ratio and the splitting, the agent properties change through the
Sharpe Ratio are well-known indices for active interaction between the leader and the followers,
portfolio evaluation. On the other hand, if you think and gradually a best performing agent (the leader
that it is not possible to outperform the market, agent) with the highest performance value emerges
you should invest in a passive portfolio. There are as the optimal solution. Hence, our Agent-based
several reports that index funds that employ pas- Model has the advantage that our model searches
sive management show better performance than solutions in global space as well as local spaces
other mutual funds (e.g. see Elton et. al, 1996; for this tight optimization problem, because plural
Gruber, 1996; Malkiel, 1995). The correlation leader agents appear and disappear during search
between the portfolio price and the benchmark space splitting.
index and Beta are famous indices that are used The structure of the balance of this chapter
to evaluate the passive portfolio. is as follows: Section 2 describes related works.
This optimization problem can be viewed as a Section 3 defines the portfolio optimization prob-
discrete combinatorial problem regardless of the lem and describes the performance value used to
index we choose to evaluate the performance of evaluate the portfolio as the objective function. In
active or passive portfolios. Hence, this optimiza- Section 4, we propose an Agent-based Model in
tion problem has two subproblems. The first one is order to optimize the portfolios. Section 5 shows
that portfolios consisting of quite different weights
20
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
the results of numerical experiments obtained by On the other hand, agent-based models for
the simulation of the Agent-based Model. artificial financial markets have recently been
popular in research. The agent properties change
by the interaction between agents, and gradually
RELATED WORKS a best performing agent emerges as an optimal
solution (e.g. see LeBaron et al., 1999; Chen &
Markowitz (1987) proposed the mean-variance Yeh, 2002). Based on the idea of agent-based
methodology for portfolio optimization problems. models, we propose an Agent-based Model for the
The objective function in his methodology is to portfolio optimizations in this chapter. In general,
minimize the variance of the expected returns as many agent-based models describe interactions
the investment risk under the expected return as and dynamics in a group of traders in an artificial
the investment return. Many researchers have ex- financial market (LeBaron, 2000). Our Agent-
tended his methodology to practical formulae and based Model is implemented as a global and local
applications, and have tackled this problem by us- search method for portfolio optimization.
ing methods based on evolutional algorithms. Xia
et al. (2000) proposed a new mean-variance model
with an order of the expected returns of securities, PORTFOLIO OPTIMIzATION
and applied a GA to optimize the portfolios. Chang PROBLEM
et al. (2000) proposed the extended mean-variance
models, and applied various evolution algorithms, First, we define the following notations for the
such as GA, simulated annealing and tabu search. portfolio optimization problem.
Lin and Liu (2008) proposed the extended mean- N: total number of all the assets included in
variance model with minimum transaction lots, and a portfolio.
applied GA to optimize the practical portfolios. i: Asset i, i = 1,…, N.
Streichert & Tanaka-Yamawaki (2006) evaluated Findex (t ) : the value of the given benchmark
a multi-objective evolutionary algorithm with a index at t.
quadratic programming local search for the multi-
Pindex : the sequence of the rates of changes of
objective constrained or unconstrained portfolio
benchmark index over t = 1,…,T. That is the vec-
optimization problems. For passive portfolio op-
timization problems, Oh et al. (2005) showed the ( )
t o r Pindex = Pindex (1), , Pindex (T ) w h o s e
effectiveness of index funds optimized by a GA Pindex (t ) i s d e f i n e d a s
on the Korean Stock Exchange. Their objective
function was a function based on a beta which is ( )
Pindex (t ) = Findex (t + 1) − Findex (t ) Findex (t ) .
a measure of correlation between the fund’s price Fi(t): the price of Asset i at t.
and the benchmark index. Orito & Yamamoto Pi: the sequence of the return rates of Asset i
(2007) and Orito et al. (2009) proposed GA meth- over t = 1,…, T. That is the vector Pi=(Pi (1),…,Pi
ods with a heuristic local search and optimized the (T))hose Pi(t) is defined as Pi(t) = (Fi(t+1)-Fi(t))/
index funds on the Tokyo Stock Exchange. Their Fi(t).
objective functions were correlation coefficients M: total number of all the units of investment.
between the fund’s return rates and the changing Mi : the unit of investment for Asset i. That is
rates of benchmark indices. Aranha & Iba (2007) an integer such that ∑ Mi = M.
also proposed a similar but different method on wi: the weight of Asset i included in the port-
the Tokyo Stock Exchange. folio. That is a real number wi = Mi /M (0 ≤ wi ≤
1. Note that we do not discuss the short sale here.
21
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
k
(
PG = PG (1), , PG (T )
k k
) w h o s e
N
PG (t ) = ∑ wi ⋅ Pi (t ).
k
i =1
It is well known that the Information Ratio, AGENT-BASED MODEL
which is built on the modern portfolio theory, is
an index for the evaluation of active portfolios. In our Agent-based Model, the search space is
It is defined as the active return divided by the split into several small spaces. Agents in one
tracking error. The active return means the amount population search for solutions in one small space.
of performance over or under a given benchmark One population consists of one leader agent and
index. The tracking error is the standard deviation several follower agents. As the agents’ properties
of the active returns. Therefore, it is desirable to through evolve, a new leader appears, and then all
achieve a high Information Ratio. populations in the search space are re-composed.
In this chapter, we define the Information Ratio Such re-composition of populations represents
of portfolio Gk as the objective function for the the search space splitting as shown in Figure 1.
portfolio optimization problem to be as follows: In this section, we describe our Agent-based
Model. Section 1 defines the agent. Section 2
defines the evolutional behavior of the agents in
E PG − Pindex a population. Section 3 describes the outcome of
max IRG = k , (1)
k processing of the population as the search space
var PG − Pindex splitting.
k
Agents
where E PG − Pindex is the expected value of Figure 2 shows a typical agent in our Agent-based
k Model. Each agent has its own portfolio, a unit of
the historical data of portfolio’s return rates over
investment of the portfolio, the Information Ratio,
or under the rates of changes of benchmark index,
and its character as the properties. The character
and var PG − Pindex is the standard deviation of agent is either “obedient”, “disobedient”, or
k “independent.” The agent’s evolutional behavior
from the same data.
depending on its character is described in Section 2.
In this chapter, we optimize portfolios that
consist of the combination of weights of N assets
Evolutional Behavior of Agents in a
with M elements in order to maximize the Infor-
Population
mation Ratio given by Equation (1). To obtain
one portfolio Gk = (w1,…,wN) means to obtain its
For our Agent-based Model, let s be the number
Information Ration as one solution. The number
of iterations of the evolutional process. In the first
of combinations, i.e. the number of solutions, is
(s = 1) iteration, we set W agents as the initial
given by (M + N – 1)!/N!(M -1)!.
22
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
23
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
E PG − Pindex
k2
(G = (w , , w ))
k2 in +1 iN
(3)
24
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
25
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
26
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
The total number of all the units of investment: M 20000 (=N ×100)
27
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
Figure 7. Information Ratio as a function of the number of iterations of the evolutionary process
28
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
Figure 8. Information Ratio as a function of the number of selected assets for the interaction between
the leader and the follower
is shown in Figures 9 and 10, respectively. Note solution space is large. This means that the effec-
that we set {bsame , bdiff } = {0.5, 0.5} for Figure tive population should consist of many obedient
agents and few disobedient and independent
9 and {aabd , adob , aidp } = {0.6, 0.2, 0.2} for Figure
agents. Therefore, we set {aabd , adob , aidp } to be
10.
{0.8, 0, 0.2} for experiments in the next section.
From Figure 9, the Information Ratio with
On the other hand, from Figure 10, we can
{aabd , adob , aidp } = {0.8, 0, 0.2} is the highest. We observe that almost of all the Information Ratios
can conclude that the Information Ratio is high are similar. The exceptions are the cases when
when the proportion of obedient followers in the each of bsame and bdiff is set to 0 or 1. This means
29
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
that the ratio of the follower agents who moves is well known that GA is a useful stochastic search
to the new leader’s population from the same or method for such optimization problems (for this,
the different population to which the current see e.g. Holland, 1975; Goldberg, 1989). The
leader belong does not affect the results in our genetic representation of our GA is shown in
Agent-based Model. Figure 11. A gene represents a weight of an asset
wi and a chromosome represents a portfolio Gk.
Comparison of Agent- The fitness value of GA is defined as the Informa-
Based Model and GA tion Ratio.
On the first generation of the GA, we ran-
We compare our Agent-based Model with GA for domly generate the initial population. We apply
the combinatorial optimization problem with the the uniform crossover for exchanging the partial
total number of assets N = 200 and the total num- structure between the two chromosomes and repair
ber of units of investment M = 20000. Therefore, to a probability distribution via renormalization.
the number of combinations is given by (20000 We also apply the uniform mutation for replacing
+ 200 – 1)!/200! (20000 – 1)!. For our model, we the partial structure of the selected chromosomes
set parameters as K = 500, n = 40, with a new random value in [0, 1] and repair to a
{aabd , adob , aidp } = {0.8, 0, 0.2} a n d probability distribution via renormalization. After
making offspring, we apply a roulette wheel selec-
{b same
, bdiff } = {0.5, 0.5} . On the other hand, it
tion and an elitism method of 10% chromosomes
30
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
based on the fitness value. As a termination cri- leader and the n followers, and the ratio of the
terion of GA, we apply the generation size. Note obedient, disobedient and independent agents in
that the population size, the generation size, the the solution space {aabd , adob , aidp } .
crossover rate and the mutation rate are set to the
similar values of our Agent-based Model, 100
(the same value of the total number of assets in CONCLUSION
the solution space), 500 (the same value of K),
0.8 (the same value of the ratio of obedient fol- In this chapter, we proposed an Agent-based Model
lower in the solution space) and 0.2 (the same to solve the portfolio optimization problem which
value of the ratio of independent follower in the is a tight optimization problem. The most notable
solution space), respectively. The Information advantage of our Agent-based Model is that our
Ratio average of 20 simulations obtained by our model searches solutions in global space as well
Agent-based Model and GA are shown in Table as local spaces.
2. From the numerical experiments, we conclude
From Table 2, we can observe that the Infor- that our Agent-based Model using search space
mation Ratios of portfolios obtained by our Agent- splitting produces more optimal portfolios than
based Model are higher than those of GA for all simple GA. However, the results obtained by our
the periods. In almost all the periods, the Informa- model depend on the parameters; the number of
tion Ratios obtained by our model exceeds the selected assets for the interaction between the
final results of GA within 50 iterations of evolu- leader and the n followers, and the ratio of the
tionary process. obedient, disobedient and independent agents in
Therefore, we can conclude that our Agent- the solution space {aabd , adob , aidp } .
based Model using search space splitting pro-
Sometimes, portfolios consisting of quite dif-
duces more optimal portfolios than simple GA.
ferent weights of assets have the similar Informa-
However, the results obtained by our Agent-based
tion Ratios to the ratios of other portfolios. We
Model depend on the parameters; the number of
do not have reasonable explanations for this fact.
selected assets for the interaction between the
31
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
For this, it would be beneficial for us to visualize Goldberg, D. E. (1989). Genetic Algorithms in
the landscape partially and improve our Agent- Search, Optimization and Machine Learning.
based Model on the partial landscape. It is hard Addison-Wesley.
to visualize the landscape of solutions, however,
Gruber, M. J. (1996). Another Puzzle: The Growth
because this problem can be viewed as a discrete
in Actively Managed Mutual Funds. The Journal
combinatorial problem. In addition, we need to
of Finance, 51(3), 783–810. doi:10.2307/2329222
rebalance the portfolios in the future period in
order to maintain their performance. These issues Holland, J. H. (1975). Adaptation in Natural and
are reserved for our future work. Artificial Systems. University of Michigan Press.
LeBaron, B. (2000). Agent-based Computational
Finance: Suggested Readings and Early Research.
ACKNOWLEDGMENT
Journal of Economics & Control, 24, 679–702.
doi:10.1016/S0165-1889(99)00022-6
The authors are grateful to Kimiko Gosney who
gave us useful comments. The first author ac- LeBaron, B., Arthur, W. B., & Palmer, R. (1999).
knowledges partial financial support by Grant Time Series Properties of an Artificial Stock Mar-
#20710119, Grant-in-Aid for Young Scientists ket. Journal of Economics & Control, 23, 1487–
(B) from JSPS, (2008-). 1516. doi:10.1016/S0165-1889(98)00081-5
Lin, C. C., & Liu, Y. T. (2008). Genetic Algorithms
for Portfolio Selection Problems with Minimum
REFERENCES
Transaction Lots. European Journal of Opera-
Aranha, C., & Iba, H. (2007). Portfolio Manage- tional Research, 185(1), 393–404. doi:10.1016/j.
ment by Genetic Algorithms with Error Model- ejor.2006.12.024
ing. In JCIS Online Proceedings of International Malkiel, B. (1995). Returns from Investing in
Conference on Computational Intelligence in Equity Mutual Funds 1971 to 1991. The Journal
Economics & Finance. of Finance, 50, 549–572. doi:10.2307/2329419
Chang, T. J., Meade, N., Beasley, J. E., & Sharaiha, Markowitz, H. (1952). Portfolio Selection. The
Y. M. (2000). Heuristics for Cardinality Con- Journal of Finance, 7, 77–91. doi:10.2307/2975974
strained Portfolio Optimization . Computers & Op-
erations Research, 27, 1271–1302. doi:10.1016/ Markowitz, H. (1987). Mean-Variance Analysis in
S0305-0548(99)00074-X Portfolio Choice and Capital Market. New York:
Basil Blackwell.
Chen, S. H., & Yeh, C. H. (2002). On the Emer-
gent Properties of Artificial Stock Markets: The Oh, K. J., Kim, T. Y., & Min, S. (2005). Using
Efficient Market Hypothesis and the Rational Genetic Algorithm to Support Portfolio Optimiza-
Expectations Hypothesis. Journal of Behavior tion for Index Fund Management . Expert Systems
&Organization, 49, 217–239. doi:10.1016/S0167- with Applications, 28, 371–379. doi:10.1016/j.
2681(02)00068-9 eswa.2004.10.014
32
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
Orito, Y., Takeda, M., & Yamamoto, H. (2009). KEY TERMS AND DEFINITIONS
Index Fund Optimization Using Genetic Algo-
rithm and Scatter Diagram Based on Coefficients Portfolio Optimization: A combinatorial
of Determination. Studies in Computational In- optimization problem that determines proportion-
telligence: Intelligent and Evolutionary Systems, weighted combination in a portfolio in order to
187, 1–11. achieve an investment objective.
Information Ratio: A well-known measure
Orito, Y., & Yamamoto, H. (2007). Index Fund of performance of actively managed portfolios.
Optimization Using a Genetic Algorithm and Agent Property: An agent has one portfolio,
a Heuristic Local Search Algorithm on Scatter its Information Ratio and a character as a set of
Diagrams. In Proceedings of 2007 IEEE Congress properties.
on Evolutionary Computation (pp. 2562-2568). Leader Agent: An agent whose Information
Streichert, F., & Tanaka-Yamawaki, M. (2006). Ratio is the highest of all agents in search space.
The Effect of Local Search on the Constrained Follower Agent: An agent is categorized
Portfolio Selection Problem. In Proceedings of into any of three groups, namely obedient group,
2006 IEEE Congress on Evolutionary Computa- disobedient group, and indifferent group. An
tion (pp. 2368-2374). obedient agent is an agent that imitates a part of
the leader’s portfolio. A disobedient agent is an
Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). agent that does not imitate a part of the leader’s
A Model for Portfolio Selection with Order of portfolio. An independent agent is an agent whose
Expected Returns. Computers & Operations behavior is not influenced by the actions of the
Research, 27, 409–422. doi:10.1016/S0305- leader agent.
0548(99)00059-3 Search Space Splitting: Re-composition of
populations. One population consists of one leader
agent and several follower agents. As agents’
properties evolve, a new leader appears, and then
all populations in a search space are re-composed.
33
Section 2
Neuro-Inspired Agents
35
Chapter 3
Neuroeconomics:
A Viewpoint from Agent-Based
Computational Economics
Shu-Heng Chen
National Chengchi University, Taiwan
Shu G. Wang
National Chengchi University, Taiwan
ABSTRACT
Recently, the relation between neuroeconomics and agent-based computational economics (ACE) has
become an issue concerning the agent-based economics community. Neuroeconomics can interest agent-
based economists when they are inquiring for the foundation or the principle of the software-agent design,
normally known as agent engineering. It has been shown in many studies that the design of software
agents is non-trivial and can determine what will emerge from the bottom. Therefore, it has been quested
for rather a period regarding whether we can sensibly design these software agents, including both the
choice of software agent models, such as reinforcement learning, and the parameter setting associated
with the chosen model, such as risk attitude. In this chapter, we shall start a formal inquiry by focusing
on examining the models and parameters used to build software agents.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Neuroeconomics
36
Neuroeconomics
37
Neuroeconomics
needed to solve a particular problem using a (Kagan, 2006). This system is activated when
specific decision strategy. In addition, they state humans are faced with potential or actual negative
that individual differences in decision behavior events in their life. The system functions to assist
may be related to differences in how much effort in protecting humans from extreme reactions to
the various elementary information processes the those negative events. Sharot, De Martino and
individuals are required to make. Dolan(2008) studied how hedonic psychology af-
Hedonic Psychology Hedonic psychology fects our choices from a neural perspective. They
is the study of what makes experiences and life combined participants’ estimations of the pleasure
pleasant or unpleasant (Kahneman, Diener, and they will derive from future events with fMRI data
Schwarz, 2003). It is concerned with feelings of recorded while they imagined those events, both
pleasure and pain, of interest and boredom, of joy before, and after making choices. It was found
and sorrow, and of satisfaction and dissatisfaction. that activity in the caudate nucleus predicted the
All decisions involve predictions of future tastes choice agents made when forced to choose between
or feelings. Getting married involves a prediction two alternatives they had previously rated equally.
of one’s long-term feelings towards one’s spouse; Moreover, post choice the selected alternatives
returning to school for an advanced degree in- were valued more strongly than pre-choice, while
volves predictions about how it will feel to be a discarded ones were valued less. This post-choice
student as well as predictions of long-term career preference change was mirrored in the caudate
preferences; buying a car involves a prediction nucleus response. The choice-sensitive preference
of how it would feel to drive around in different observed above is similar to behavior driven by
cars. In each of these examples, the quality of the reinforcement learning.
decision depends critically on the accuracy of the
prediction; errors in predicting feelings are mea-
sured in units of divorce, dropout, career burnout VALUE AND ChOICE
and consumer dissatisfaction (Loewenstein and
Schkade, 2003). “Neuroeconomics is a relatively new discipline
Empathy Gaps People are often incorrect that studies the computations that the brain carries
about what determines happiness, leading to out in order to make value-based decisions, as
prediction errors. In particular, the well-known well as the neural implementation of those com-
empathy gaps, i.e., the inability to imagine opposite putations. It seeks to build a biologically sound
feelings when experiencing heightened emotion, theory of how humans make decisions that can
be it happy or sad, lead to errors in predicting be applied in both the natural and the social sci-
both feelings and behavior (Loewenstein, 2005). ences.” (Rangel, Camerer, and Montague, 2008)
So, people seem to think that if disaster strikes “In a choice situation, we usually look at a few
it will take longer to recover emotionally than it alternatives, sometimes including a small number
actually does. Conversely, if a happy event occurs, that we generate for the purpose but more often
people overestimate how long they will emotion- limiting ourselves to those that are already known
ally benefit from it. and available. These alternatives are generated
Psychological Immune System The cogni- or evoked in response to specific goals or drives
tive bias above also indicates that agents may (i.e. specific components of the utility function),
underestimate the proper function of their psy- so that different alternatives are generated when
chological immune systems. The psychological we are hungry from when we are thirsty; when we
immune system is a system which helps fight off are thinking about our science from when we are
bad feelings that result from unpleasant situations thinking about our children.” (Simon, 2005, p. 93)
38
Neuroeconomics
The very basic economics starts with value Frederick, Loewenstein, and O’Donoghue (2002)
assignment and choice making. However, tradi- provided an extensive survey on the empirical
tional economics makes little effort to understand studies showing that the observed discount rates
the cognitive and computation loading involved are not constant over time, but appear to decline.
in this very fundamental economic activity. A Loewenstein (1988) has further demonstrated
number of recent studies have challenged the view that discount rates can be dramatically affected
that what we used to be taught may be misplaced by whether the change in delivery time of an
when we take into account the value-assignment outcome is framed as an acceleration or a delay
problem more seriously (Iyengar and Lepper, from some temporal reference point. So, when
2000; Schwartz, 2003). These studies lead us asked whether they would be willing to wait for
to question the impact of the dimensionality of a month to receive $110 instead of receiving $100
choice space upon our behavior of value assign- today, most people choose $100 today. By contrast,
ment and choice making. It seems that when the when asked whether they would prefer to speed up
number of choices increases, the ability to make the receipt of $110 in a month by receiving $100
the best choice becomes problematic. today instead, most people exhibit patience and
Going one step further, Louie, Grattan, and take the $110 in a month. This phenomenon has
Glimcher(2008) attempt to theorize this paradox been used as evidence for the gain-loss asymmetry
of choice by exploring the neural mechanism or the prospect theory. It has also been connected
underlying value representation during decision- to the endowment effect, which predicts that people
making and how such a mechanism influences tend to value objects more highly after they come
choice behavior in the presence of alternative to feel that they own them (Kahneman, Knetsch
options. In their analysis, value assignment is and Thaler, 1990; Kahneman, 1991). The endow-
relatively normalized when new alternatives are ment effect explains the reluctance of people to
presented. The linear proportionate normalization part with assets that belong to their endowment.
is a simple example. Because value is relatively Nonetheless, Lerner, Small and Loewenstein
coded rather than absolutely coded, the value dif- (2004) show that the agents’ mood, sad or neutral,
ferences between two alternatives may become can affect the appearance of this effect.
narrow when more alternatives are presented. Query Theory Recently, query theory, pro-
posed by Johnson, Haeubl and Keinan (2007),
Intertemporal Choice has been used to explain this and other similar
choice inconsistencies. Query theory assumes that
Agent-based economic models are dynamic. Time preferences, like all knowledge, are subject to the
is an inevitable element, and the time preference processes and dynamics of memory encoding and
becomes another important setting for agents in retrieval, and explores whether memory and atten-
the agent-based models. However, in mainstream tional processes can explain observed anomalies in
economic theory, the time preference has been evaluation and choice. Weber et al. (2007) showed
largely standardized as an exponential discount- that the directional asymmetry in discounting is
ing with a time-invariant discount rate. However, caused by the different order in which memory
recent studies have found that people discount is queried for reasons favoring immediate versus
future outcomes more steeply when they have future consumption, with earlier queries resulting
the opportunity for immediate gratification than in a richer set of responses, and reasons favoring
when all outcomes occur in the future. This has immediate consumption being generated earlier
led to the modification of the declining discount for delay vs. acceleration decisions.
rates or hyperbolic-discounting (Laibson, 1997).
39
Neuroeconomics
40
Neuroeconomics
way. While agent-based modeling relies heavily utility depends on the number of states. When
on the idea of heterogeneity, preference or risk the number of states increases, it is more likely
preference in most studies is normally assumed that the mean-variance preference may fit the data
to be homogeneous. Little has been explored on better than the expected utility.
the aggregate dynamics generated by a society
of agents with heterogeneous risk preference.3
Nevertheless, it seems to be quite normal to see LEARNING AND ThE
agents with heterogeneous risk preferences in DRPE hYPOThESIS
neuroeconomic experiments (Paulsen et al., 2008).
Genetics have contributed in accounting for One essential element of agent-based computa-
the difference in risk preference. Kuhnen and tional economics is the notion of autonomous
Chiao (2008) showed that several genes previ- agents, i.e, the agents who are able to learn and
ously linked to emotional behavior and addiction adapt on their own. It would have been a big
are also found to be correlated with risk-taking surprise to us if neuroscience had not cared about
investment decisions. They found that 5HTLPR learning. However, it will also be a surprise to us
ss allele carriers are more risk averse than those if the learning algorithms which we commonly
carrying the sl or ll alleles of the gene. D4DR use for the software agents can actually have their
7-repeat allele carriers are more risk seeking than neural representations. Nonetheless, a few recent
individuals without the 7-repeat allele. Individuals studies have pointed in this direction.
with the D2DR A1/A1 genotype have more stable Studies start with how the brain encodes the
risk preferences than those with the A1/A2 or A2/ prediction error, and how other neural modules
A2 genotype, while those with D4DR 4-repeat react to these errors. The most famous hypothesis
allele have less stable preferences than people in this area is the Dopaminergic reward prediction
who do not have the 4-repeat allele. error (DRPE) hypothesis. This hypothesis states
One of the essential developments in neuro- that neurons that contain the neurotransmitter
economics is to provide neural foundations of release dopamine in proportion to the difference
the risk preferences. It is assumed that the hu- between the predicted reward and the experienced
man brain actually follows the finance approach, reward of a particular event. Recent theoretical
encoding the various statistical inputs needed for and experimental work on dopamine release has
the effective evaluation of the desirability of risky focused on the role that this neurotransmitter plays
gambles. In particular, neurons in parts of the in learning and the resulting choice behavior.
brain respond immediately (with minimal delay) Neuroscientists have hypothesized that the role
to changes in expected rewards and with a short of dopamine is to update the value that humans
delay (about 1 to 2 seconds) to risk, as measured and animals attach to different actions and stimuli,
by the payoff variance (Preuschoff, Bossaerts and which in turn affects the probability that such an
Quartz, 2006). Whether one can find evidence of action will be chosen. If true, this theory suggests
higher-order risk (skewness aversion, for instance) that a deeper understanding of dopamine will
remains an interesting issue. expand economists’ understanding of how beliefs
Some initial studies indicate that risk prefer- and preferences are formed, how they evolve, and
ence may be context-dependent or event-driven, how they play out in the act of choice.
which, to some extent, can be triggered by how Caplin and Dean (2008) formulate the DRPE
the risky environment is presented. d’Acremont hypothesis in axiomatic terms. Their treatment
and Bossaerts(2008) show that the dominance has precisely the revealed preference character-
of mean-variance preference over the expected istic of identifying any possible reward function
41
Neuroeconomics
directly from the observables. They discuss adapted strategies. EEG recordings revealed
the potential for measured dopamine release to activation of a reflective (conflict-resolution)
provide insight into belief formation in repeated system, evidently to inhibit impulsive emotional
games and to learning theory, e.g., reinforcement reactions after disappointing outcomes. Pearson
learning. Their axiomatic model specifies three et al. (2008) initiated another interesting line of
easily testable conditions for the entire class of research, i.e., the neural representations which
reward prediction error (RPE) models. Briefly, the distinguish exploration from exploitation, the
axioms will be satisfied if activity is (1) increase two fundamental search strategies frequently
wit prize magnitude (2) decreasing with lottery used in various intelligent algorithms, say, genetic
expected value and (3) equivalent for outcomes algorithms.
from all lotteries with a single possible outcome.
These three conditions are both necessary and
sufficient for any RPE signal. If they hold, there DUAL SYSTEM CONjECTURE
is a way of defining experienced and predicted
reward such that the signal encodes RPE with The dual system conjecture generally refers to
respect to those definitions. Rutledge et al. (2008) the hypothesis that human thinking and decision-
used the BOLD responses at the outcome time to making are governed by two different but interact-
test whether activity in the nucleus accumbens ing systems. This conjecture has been increasingly
satisfies the axioms of the RPE model. recognized as being influential in psychology
Klucharev et al. (2008) show that a deviation (Kahneman, Diener, and Schwarz, 2003), neural
from the group opinion is detected by neural science (McClure, 2004), and economics. The two
activity in the rostral cingular zone (RCZ) and systems are an affective system and a deliberative
ventral striatum. These regions produce a neu- system (Loewenstein and O’Donoghue, 2005) or
ral signal similar to the prediction error signal a reflexive system and a reflective system (Lieber-
in reinforcement learning that indicates a need man, 2003). The affective system is considered to
for social conformity: a strong conflict-related be myopic, activated by environmental stimuli,
signal in the RCZ and NAc trigger adjustment of and primarily driven by affective states. The de-
judgments in line with group opinion. Using an liberative system is generally described as being
olfactory categorization task performed by rats, goal-oriented and forward-looking. The former is
Kepecs, Uchida, and Mainen (2008) attempt to associated with the areas of the brain that we have
obtain evidence for quantitative measurements of labeled the ventral striatum (nucleus accumbens,
learning increments and test the hypothesis implied ventral caudate, and ventral putamen), the right
by the reinforcement learning, i.e., one should striatum, neostriatum and amygdala, among oth-
learn more when uncertain and less when certain. ers, whereas the latter is associated with the areas
Studies also try to find the neural representation of the brain that we have labeled the ventromedial
of different learning algorithms. The commonly and dorsolateral prefrontal and anterior cingulate,
used reinforcement learning and Bayesian learning among others.
is compared in Bossaerts et al. (2008) where they The dual system of the brain has become the
address the existence of the dual system.4 They neuroeconomic area which economic theorists
consider the reflective system and the reflexive take the most seriously. This has also helped
system as the neural representation of Bayesian with the formation of the new field known as
learning and reinforcement learning, respectively. neuroeconomic theory. A number of dual-process
Using the trust game, they were able to stratify models have been proposed in economics with
subjects into two groups. One group used well- applications to intertemporal choice (Loewenstein
42
Neuroeconomics
and O’Donoghue, 2005; Fudenberg and Levin, and trading volume. If individual learning can be
2006; Brocas and Carrillo, 2008), risk preferences associated with, say, the deliberative system, and
(Loewenstein and O’Donoghue, 2005), and so- social learning can be connected to the affective
cial preferences (Loewenstein and O’Donoghue, system, then the dual system can also be applied to
2005). All these models view economic behavior agent-based modeling. This issue opens the future
as being determined by the interaction between to collaboration between agent-based economics
two different systems. and neuroeconomics.
The application of the dual system conjecture
to learning is just the beginning. Earlier, we have
mentioned the cognitive loading between different FROM MODULAR MIND/BRAIN
learning algorithms, such as reinforcement learn- TO MODULAR PREFERENCE
ing vs. Bayesian learning (see Section 4). This
issue has been recently discussed in experimental At present, modularity (Simon, 1965) is still not
economics (Charness and Levin, 2005), and now a part of agent-based economic modeling. This
also in neuroeconomics (Bossaerts et al.,2008). absence is a little disappointing since ACE is
regarded as a complement to mainstream eco-
Software Agents with nomics in terms of articulating the mechanism of
Neurocognitive Dual Systems evolution and automatic discovery. One way of
making progress is to enable autonomous agents to
While agents with dual systems have been con- discover the modular structure of their surround-
sidered to be a new research direction in neuro- ings, and hence they can adapt by using modules.
economic theory (Brocas and Carrillo, 2008a, This is almost equivalent to causing their “brain”
Brocas and Carrillo, 2008b), software agents or “mind” to be designed in a modular way as well.
or autonomous agents in agent-based modeling The only available work in agent-based eco-
mostly follow a single system. However, the dual nomic modeling which incorporates the idea of
system interpretation exists for many agent-based modularity is that related to the agent-based models
economic models. Consider the fundamentalist- of innovation initiated by Chen and Chie (2004).
chartist model as an example, where the fun- They proposed a modular economy whose demand
damentalist’s and chartist’s behavior can be side and supply side both have a decomposable
differentiated by the associated neural systems, structure. While the decomposability of the supply
say, assuming the former is associated with a side, i.e., production, has already received inten-
deliberative system while the latter is associated sive treatment in the literature, the demand side
with the affective system. has not. Inspired by the study of neurocognitive
Another example is the individual learning modularity, Chen and Chie (2004) assume that
vs. social learning. These two learning schemes the preference of consumers can be decompos-
have been frequently applied to model the learn- able.5 In this way, the demand side of the modular
ing behavior in experiments and their fit to the economy corresponds to a market composed of a
experimental data are different (Hanaki, 2005). set of consumers with modular preference.
Agent-based simulation has also shown that their In the modular economy, the assumption of
emergent patterns are different. For example, in the modular preference is made in the form of a
context of an artificial stock market, Yeh and Chen dual relationship with the assumption of modular
(2001) show that agents using individual learning production. Nevertheless, whether in reality the
behave differently from agents using social learn- two can have a nice mapping, e.g., a one-to-one
ing in terms of market efficiency, price dynamics relationship, is an issue related to the distinction
43
Neuroeconomics
between structural modularity and functional ers up to higher hierarchies. However, consumers
modularity. While in the literature this distinc- become more and more heterogeneous when their
tion has been well noticed and discussed, “recent preferences are compared at higher and higher
progress in developmental genetics has led to hierarchies, which calls for a greater diversity of
remarkable insights into the molecular mecha- products.6 It can then be shown that the firm using
nisms of morphogenesis, but has at the same time a modular design performs better than the firm
blurred the clear distinction between structure not using a modular design, as Simon predicted.
and function.” (Callebaut and Rasskin-Gutman,
2005, p. 10)
The modular economy considered by Chen CONCLUDING REMARKS: AGENT
and Chie (2004) does not distinguish between the BASED OR BRAIN BASED?
two kinds of modularity, and they are assumed
to be identical. One may argue that the notion Can we relate agent-based economics to brain-
of modularity that is suitable for preference is based economics (neuroeconomics)? Can we
structural, i.e., what it is, whereas the one that use the knowledge which we obtain from neuro-
is suitable for production is process, i.e., what economics to design software agents? One of the
is does. However, this understanding may be features of agent-based economics is the emphasis
partial. Using the LISP (List Programming) parse- on the heterogeneity of agents. This heterogeneity
tree representation, Chen and Chie (2004) have may come from behavioral genetics. Research
actually integrated the two kinds of modularity. has shown that genetics has an effect on our risk
Therefore, consider drinking coffee with sugar preference. Kuhnen and Chiao (2008), Jamison
as an example. Coffee and sugar are modules for et al. (2008), and Weber et al. (2008) show that
both production and consumption. Nevertheless, preferences are affected by the genes and/or
for the former, producers add sugar to coffee to education (environment). With the knowledge of
deliver the final product, whereas for the latter, genetics and neuroeconomics, the question is: How
the consumers drink the mixture knowing of the much more heterogeneity do we want to include
existence of both components or by “seeing” the in agent-based modeling? Does it really matter?
development of the product. Heterogeneity may also result from age. The
Chen and Chie (2007) tested the idea of aug- neuroeconomics evidence shows that certain
mented genetic programming (augmented with functions of the brain will age. The consequence
automatically defined terminals) in a modular is that elderly people will make some systematic
economy. Chen and Chie (2007)considered an errors more often than young people, and, age
economy with two oligopolistic firms. While both will affect financial decisions as well (Samanez
of these firms are autonomous, they are designed Larkin, Kuhnen, and Knutson, 2008). Thus the
differently. One firm is designed with simple GP same question arises: when engaging in agent-
(SGP), whereas the other firm is designed with based modeling, should we take age heterogeneity
augmented GP (AGP). These two different designs into account? So, when a society ages, should we
match the two watchmakers considered by Simon constantly adjust our agent-based model so that
(1965). The modular preferences of consumers not it can match the empirical age distribution of the
only define the search space for firms, but also a society? So far we have not seen any agent-based
search space with different hierarchies. While it is modeling that features the aspect of aging.
easier to meet consumers’ needs with very low-end Neuroeconomics does encourage the modular
products, the resulting profits are negligible. To design of agents, because our brain is a modular
gain higher profits, firms have to satisfy consum- structure. Many different modules in the brain
44
Neuroeconomics
have been identified. Some modules are related Caplin, A., & Dean, M. (2008). Economic insights
to emotion, some are related to cognition, and from ``neuroeconomic’’ data. The American
some are related to self-control. When human Economic Review, 98(2), 169–174. doi:10.1257/
agents are presented with different experimental aer.98.2.169
settings, we often see different combinations of
Charness, G., & Levin, D. (2005). When opti-
these modules.
mal choices feel wrong: A laboratory study of
Bayesian updating, complexity, and affect. The
American Economic Review, 95(4), 1300–1309.
ACKNOWLEDGMENT
doi:10.1257/0002828054825583
The author is grateful for the financial support Chen, S.-H. (2008). Software-agent designs in
provided by the NCCU Top University Program. economics: An interdisciplinary framework.
NSC research grant No. 95-2415-H-004-002- IEEE Computational Intelligence Magazine, 3(4),
MY3 is also gratefully acknowledged. 18–22. doi:10.1109/MCI.2008.929844
Chen, S.-H., & Chie, B.-T. (2004). Agent-based
economic modeling of the evolution of technology:
REFERENCES
The relevance of functional modularity and genetic
Baldassarre, G. (2007, June). Research on brain programming. International Journal of Modern
and behaviour, and agent-based modelling, will Physics B, 18(17-19), 2376–2386. doi:10.1142/
deeply impact investigations on well-being (and S0217979204025403
theoretical economics). Paper presented at Inter- Chen, S.-H., & Chie, B.-T. (2007). Modularity,
national Conference on Policies for Happiness, product innovation, and consumer satisfaction:
Certosa di Pontignano, Siena, Italy. An agent-based approach . In Yin, H., Tino, P.,
Bossaerts, P., Beierholm, U., Anen, C., Tzieropou- Corchado, E., Byrne, W., & Yao, X. (Eds.), Intel-
los, H., Quartz, S., de Peralta, R., & Gonzalez, S. ligent Data Engineering and Automated Learning
(2008, September). Neurobiological foundations (pp. 1053–1062). Heidelberg, Germany: Springer.
for “dual system”’theory in decision making under doi:10.1007/978-3-540-77226-2_105
uncertainty: fMRI and EEG evidence. Paper pre- Chen, S.-H., & Huang, Y.-C. (2008). Risk prefer-
sented at Annual Conference on Neuroeconomics, ence, forecasting accuracy and survival dynamics:
Park City, Utah. Simulations based on a multi-asset agent-based
Brocas, I., & Carrillo, J. (2008a). The brain as artificial stock market. Journal of Economic
a hierarchical organization. The American Eco- Behavior & Organization, 67(3), 702–717.
nomic Review, 98(4), 1312–1346. doi:10.1257/ doi:10.1016/j.jebo.2006.11.006
aer.98.4.1312 d’Acremont, M., & Bossaerts, P. (2008, Septem-
Brocas, I., & Carrillo, J. (2008b). Theories of the ber). Grasping the fundamental difference between
mind. American Economic Review: Papers\& expected utility and mean-variance theories. Paper
Proceedings, 98(2), 175-180. presented at Annual Conference on Neuroeconom-
ics, Park City, Utah.
Callebaut, W., & Rasskin-Gutman, D. (Eds.).
(2005). Modularity: Understanding the develop-
ment and evolution of natural complex systems.
MA: MIT Press.
45
Neuroeconomics
Feldman, J. (1962). Computer simulation of cog- Johnson, E., Haeubl, G., & Keinan, A. (2007).
nitive processes . In Broko, H. (Ed.), Computer Aspects of endowment: A query theory account
applications in the behavioral sciences. Upper of loss aversion for simple objects. Journal of
Saddle River, NJ: Prentice Hall. Experimental Psychology. Learning, Memory,
and Cognition, 33, 461–474. doi:10.1037/0278-
Figner, B., Johnson, E., Lai, G., Krosch, A.,
7393.33.3.461
Steffener, J., & Weber, E. (2008, September).
Asymmetries in intertemporal discounting: Neural Kagan, H. (2006). The Psychological Immune
systems and the directional evaluation of immedi- System: A New Look at Protection and Survival.
ate vs future rewards. Paper presented at Annual Bloomington, IN: AuthorHouse.
Conference on Neuroeconomics, Park City, Utah.
Kahneman, D., Diener, E., & Schwarz, N. (Eds.).
Fischhoff, B. (1991). Value elicitation: Is there (2003). Well-Being: The Foundations of Hedonic
anything in there? The American Psychologist, Psychology. New York, NY: Russell Sage Foun-
46, 835–847. doi:10.1037/0003-066X.46.8.835 dation.
Frederick, S., Loewenstein, G., & O’Donoghue, Kahneman, D., Knetsch, J., & Thaler, R. (1990).
T. (2002). Time discounting and time preference: Experimental tests of the endowment effect and the
A critical review. Journal of Economic Literature, Coase theorem. The Journal of Political Economy,
XL, 351–401. doi:10.1257/002205102320161311 98, 1325–1348. doi:10.1086/261737
Fudenberg, D., & Levine, D. (2006). A dual-self Kahneman, D., Knetsch, J., & Thaler, R. (1991).
model of impulse control. The American Eco- Anomalies: The endowment effect, loss aversion,
nomic Review, 96(5), 1449–1476. doi:10.1257/ and status quo bias. The Journal of Economic
aer.96.5.1449 Perspectives, 5(1), 193–206.
Hanaki, N. (2005). Individual and social learn- Kahneman, D., Ritov, I., & Schkade, D. (1999).
ing. Computational Economics, 26, 213–232. Economic preferences or attitude expressions?
doi:10.1007/s10614-005-9003-5 An analysis of dollar responses to public issues.
Journal of Risk and Uncertainty, 19, 203–235.
Iyengar, S., & Lepper, M. (2000). When choice
doi:10.1023/A:1007835629236
is demotivating: Can one desire too much of a
good thing? Journal of Personality and Social Kepecs, A., Uchida, N., & Mainen, Z. (2008,
Psychology, 79(6), 995–1006. doi:10.1037/0022- September). How uncertainty boosts learning:
3514.79.6.995 Dynamic updating of decision strategies. Paper
presented at Annual Conference on Neuroeconom-
Jamison, J., Saxton, K., Aungle, P., & Francis,
ics, Park City, Utah.
D. (2008). The development of preferences in rat
pups. Paper presented at Annual Conference on Klucharev, V., Hytonen, K., Rijpkema, M., Smidts,
Neuroeconomics, Park City, Utah. A., & Fernandez, G. (2008, September). Neural
mechanisms of social decisions. Paper presented
Jevons, W. (1879). The Theory of Political
at Annual Conference on Neuroeconomics, Park
Economy, 2nd Edtion. Edited and introduced by
City, Utah.
R. Black (1970). Harmondsworth: Penguin.
46
Neuroeconomics
Kuhnen, C., & Chiao, J. (2008, September). Ge- Loewenstein, G., & O’Donoghue, T. (2005). Ani-
netic determinants of financial risk taking. Paper mal spirits: Affective and deliberative processes
presented at Annual Conference on Neuroeconom- in economic behavior. Working Paper. Carnegie
ics, Park City, Utah. Mellon University, Pittsburgh.
Laibson, D. (1997). Golden eggs and hyperbolic Loewenstein, G., & Schkade, D. (2003). Wouldn’t
discounting. The Quarterly Journal of Economics, it be nice?: Predicting future feelings . In Kahne-
12(2), 443–477. doi:10.1162/003355397555253 man, D., Diener, E., & Schwartz, N. (Eds.), He-
donic Psychology: The Foundations of Hedonic
Lerner, J., Small, D., & Loewenstein, G. (2004).
Psychology (pp. 85–105). New York, NY: Russell
Heart strings and purse strings: Carry-over effects
Sage Foundation.
of emotions on economic transactions. Psycho-
logical Science, 15, 337–341. doi:10.1111/j.0956- Louie, K., Grattan, L., & Glimcher, P. (2008).
7976.2004.00679.x Value-based gain control: Relative reward nor-
malization in parietal cortex. Paper presented at
Lichtenstein, S., & Slovic, P. (Eds.). (2006).
Annual Conference on Neuroeconomics, Park
The Construction of Preference. Cambridge,
City, Utah.
UK: Cambridge University Press. doi:10.1017/
CBO9780511618031 MacLean, P. (1990). The Triune Brain in Evolu-
tion: Role in Paleocerebral Function. New York,
Lieberman, M. (2003). Reflective and reflexive
NY: Plenum Press.
judgment processes: A social cognitive neurosci-
ence approach . In Forgas, J., Williams, K., & von McClure, S., Laibson, D., Loewenstein, G., &
Hippel, W. (Eds.), Social Judgments: Explicit and Cohen, J. (2004). Separate neural systems value
Implicit Processes (pp. 44–67). New York, NY: immediate and delayed monetary rewards. Sci-
Cambridge University Press. ence, 306, 503–507. doi:10.1126/science.1100907
Lin, C.-H., Chiu, Y.-C., Lin, Y.-K., & Hsieh, J.-C. Mohr, P., Biele, G., & Heekeren, H. (2008, Septem-
(2008, September). Brain maps of Soochow Gam- ber). Distinct neural representations of behavioral
bling Task. Paper presented at Annual Conference risk and reward risk. Paper presented at Annual
on Neuroeconomics, Park City, Utah. Conference on Neuroeconomics, Park City, Utah.
Lo, A. (2005). Reconciling efficient markets Paulsen, D., Huettel, S., Platt, M., & Brannon, E.
with behavioral finance: The adaptive market (2008, September). Heterogeneity in risky deci-
hypothesis. The Journal of Investment Consult- sion making in 6-to-7-year-old children. Paper
ing, 7(2), 21–44. presented at Annual Conference on Neuroeco-
nomics, Park City, Utah.
Loewenstein, G. (1988). Frames of mind in
intertemporal choice. Management Science, 34, Payne, J., Bettman, J., & Johnson, E. (1993).
200–214. doi:10.1287/mnsc.34.2.200 The adaptive decision maker. New York, NY:
Cambridge University Press.
Loewenstein, G. (2005). Hot-cold empathy gaps
and medical decision making. Health Psychology, Pearson, J., Hayden, B., Raghavachari, S., & Platt,
24(4), S49–S56. doi:10.1037/0278-6133.24.4.S49 M. (2008) Firing rates of neurons in posterior
cingulate cortex predict strategy-switching in a
k-armed bandit task. Paper presented at Annual
Conference on Neuroeconomics, Park City, Utah.
47
Neuroeconomics
Preuschoff, K., Bossaerts, P., & Quartz, S. Simon, H. (2005). Darwinism, altruism and eco-
(2006). Neural Differentiation of Expected nomics. In: K. Dopfer (Ed.), The Evolutionary
Reward and Risk in Human Subcortical Struc- Foundations of Economics (89-104), Cambridge,
tures. Neuron, 51(3), 381–390. doi:10.1016/j. UK: Cambridge University Press.
neuron.2006.06.024
Slovic, P. (1995). The construction of prefer-
Pushkarskaya, H., Liu, X., Smithson, M., & ence. The American Psychologist, 50, 364–371.
Joseph, J. (2008, September). Neurobiological doi:10.1037/0003-066X.50.5.364
responses in individuals making choices in un-
Weber, B., Schupp, J., Reuter, M., Montag, C.,
certain environments: Ambiguity and conflict.
Siegel, N., Dohmen, T., et al. (2008). Combining
Paper presented at Annual Conference on Neu-
panel data and genetics: Proof of principle and
roeconomics, Park City, Utah.
first results. Paper presented at Annual Conference
Rangel, A., Camerer, C., & Montague, R. (2008). on Neuroeconomics, Park City, Utah.
A framework for studying the neurobiology of
Weber, E., Johnson, E., Milch, K., Chang, H.,
value-based decision making. Nature Reviews.
Brodscholl, J., & Goldstein, D. (2007). Asym-
Neuroscience, 9, 545–556. doi:10.1038/nrn2357
metric discounting in intertemporal choice: A
Rutledge, R., Dean, M., Caplin, A., & Glimcher, query-theory account. Psychological Science, 18,
P. (2008, September). A neural representation of 516–523. doi:10.1111/j.1467-9280.2007.01932.x
reward prediction error identified using an axiom-
Yeh, C.-H., & Chen, S.-H. (2001). Market diver-
atic model. Paper presented at Annual Conference
sity and market efficiency: The approach based
on Neuroeconomics, Park City, Utah.
on genetic programming. Journal of Artificial
Samanez Larkin, G., Kuhnen, C., & Knutson, B. Simulation of Adaptive Behavior, 1(1), 147–165.
(2008). Financial decision making across the adult
life span. Paper presented at Annual Conference
on Neuroeconomics, Park City, Utah.
ENDNOTES
Schwartz, B. (2003). The Paradox of Choice: Why
More Is Less. New York, NY: Harper Perennial.
1
See also Baldassarre (2007). While it has a
sharp focus on the economics of happiness,
Sharot, T., De Martino, B., & Dolan, R. (2008, the idea of building economic agents upon
September) Choice shapes, and reflects, expected the empirical findings of psychology and
hedonic outcome. Paper presented at Annual neuroscience and placing these agents in
Conference on Neuroeconomics, Park City, Utah. an agent-based computational framework is
Simon, H. (1955). A behavioral model of rational the same as what we argue here. From Bal-
choice. The Quarterly Journal of Economics, 69, dassarre (2007), the reader may also find a
99–118. doi:10.2307/1884852 historical development of the cardinal utility
and ordinal utility in economics. It has been
Simon, H. (1956). Rational choice and the struc- a while since economists first considered
ture of the environment. Psychological Review, that utility is a very subjective thing which
63, 129–138. doi:10.1037/h0042769 cannot be measured in a scientific way, so
Simon, H. (1965). The architecture of complexity. that interpersonal comparison of utility is
General Systems, 10, 63–76. impossible, which further causes any redis-
tribution policy to lose its ground.
48
Neuroeconomics
2
It is not clear where preferences come from, 6
If the consumers’ preferences are randomly
i.e., their formation and development pro- generated, then it is easy to see this property
cess, nor by when in time they come to their through the combinatoric mathematics. On
steady state and become fixed. Some recent the other hand, in the parlance of economics,
behavioral studies have even asserted that moving along the hierarchical preferences
people do not have preferences, in the sense means traveling through different regimes,
in which that term is used in economic theory from a primitive manufacturing economy to
(Kahneman, Ritov, and Schkade, 1999). a quality service economy, from the mass
3
For an exception, see Chen and Huang production of homogeneous goods to the
(2008). limited production of massive quantities of
4
See Section 5 for the dual system conjecture. heterogeneous customized products.
5
Whether one can build preference modules
upon the brain/mind modules is of course
an issue deserving further attention.
49
50
Chapter 4
Agents in Quantum and
Neural Uncertainty
Germano Resconi
Catholic University Brescia, Italy
Boris Kovalerchuk
Central Washington University, USA
ABSTRACT
This chapter models quantum and neural uncertainty using a concept of the Agent–based Uncertainty
Theory (AUT). The AUT is based on complex fusion of crisp (non-fuzzy) conflicting judgments of agents.
It provides a uniform representation and an operational empirical interpretation for several uncertainty
theories such as rough set theory, fuzzy sets theory, evidence theory, and probability theory. The AUT
models conflicting evaluations that are fused in the same evaluation context. This agent approach gives
also a novel definition of the quantum uncertainty and quantum computations for quantum gates that are
realized by unitary transformations of the state. In the AUT approach, unitary matrices are interpreted
as logic operations in logic computations. We show that by using permutation operators any type of
complex classical logic expression can be generated. With the quantum gate, we introduce classical logic
into the quantum domain. This chapter connects the intrinsic irrationality of the quantum system and the
non-classical quantum logic with the agents. We argue that AUT can help to find meaning for quantum
superposition of non-consistent states. Next, this chapter shows that the neural fusion at the synapse
can be modeled by the AUT in the same fashion. The neuron is modeled as an operator that transforms
classical logic expressions into many-valued logic expressions. The motivation for such neural network
is to provide high flexibility and logic adaptation of the brain model.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Agents in Quantum and Neural Uncertainty
conflicting evaluations of the same attribute. It expressions or in other words, changes crisp sets
models conflicting evaluations that are fused in into fuzzy sets. This neural network consists of
the same evaluation context. If only one evalua- neurons at two layers. At the first one, neurons or
tion is allowed for each statement in each context agents implement the classical logic operations.
(world) as in the modal logic then there is no At the second layer neurons or nagents (neuron
logical uncertainty. The situation that the AUT agents) compute the same logic expression with
models is inconsistent (fuzzy) and is very far from different results. These are many-valued neurons
the situation that modeled by the traditional logic that fuse results provided by different agents at the
that assumes consistency. We argue that the AUT first layer. They fuse conflicting or inconsistent
by incorporating such inconsistent statements is situations. The network is based on use of the
able to model different types of conflicts and their logic of the uncertainty instead of the classical
fusion known in many-valued logics, fuzzy logic, logic. The motivation for such neural network is to
probability theory and other theories. provide high flexibility and logic adaptation of the
This chapter shows how the agent approach brain model. In this brain model, communication
can be used to give a novel definition of the among agents is specified by the fusion process
quantum uncertainty and quantum computations in the neural elaboration.
for quantum gates that are realized by unitary The probability calculus does not incorporate
transformations of the state. In the AUT approach, explicitly the concepts of irrationality or logic
unitary matrices are interpreted as logic operations conflict of agent’s state. It misses structural in-
in logic computations. It is shown, that by using formation at the level of individual objects, but
permutation operators that are unitary matrixes preserves global information at the level of a set
any type of complex classical logic expression of objects. Given a dice the probability theory
can be generated. The classical logic has well- studies frequencies of the different faces E={e}
known difficulties in quantum mechanics. Now as independent (elementary) events. This set of
with the quantum gate we introduce classical elementary events E has no structure. It is only
logic into the quantum domain. We connect the required that elements of E are mutually exclusive
intrinsic irrationality of the quantum system and and complete, that is no other alternative is pos-
the non-classical quantum logic with the agents. sible. The order of its elements is irrelevant to
We argue that Agent-based uncertainty theory probabilities of each element of E. No irrationality
(AUT) can help to find meaning for quantum or conflict is allowed in this definition relative to
superposition of non-consistent states for which mutual exclusion. The classical probability calcu-
one particle can be at the different points in the lus does not provide a mechanism for modelling
same time or the same particle can have spin up uncertainty when agents communicate (collabo-
and down in the same time. rates or conflict). Recent work by Halpern (2005)
Next, this chapter shows that the neural fusion is an important attempt to fill this gap.
at the synapse can be modeled by the AUT. Agents This chapter is organized as follows: Sections
in the neural network are represented by logic input 2 and 3 provide a summary of the AUT starting
values in the neuron itself. In the ordinary neural from concepts and definitions. Section 4 presents
networks any neuron is a processor that models a links between quantum mechanics and first order
Boolean function. We change the point of view and conflicts in the AUT. Section 5 discusses the
consider a neuron as an operator that transforms neural images of the AUT. Section 6 concludes
classical logic expressions into many-valued logic this chapter.
51
Agents in Quantum and Neural Uncertainty
52
Agents in Quantum and Neural Uncertainty
53
Agents in Quantum and Neural Uncertainty
g1 g2 g3 g 4 Fusion Process
C true false false true
1
If a single decision must be made at the first
v(p) = C 2 false false true true
order of conflict, then we must introduce a fu-
C 3 true false true true
sion process of the logic values of proposition p
C 4 false false true true
given by all agents. A basic way to do this is to
compute the weighted frequency of logic value
given by all agents:
A set ofagents G is in the first order of conflict if
g
g2 ... g n-1 g n
G(A>B) ∩ G(A<B)= ∅ , G(A>B) ≠ ∅, G(A>B) v(p) = 1 (3)
v1 v2 ... vn-1 vn
∪ G(A<B)= G.
54
Agents in Quantum and Neural Uncertainty
ing initial uncertainty values and do not violate |G(p ∨ q)| = max(|G(p)|, |G(q)|) + |G(¬ p∧ q)|.
the mutual exclusion. Below we define vector (5)
logic operations for the first order of conflict
logic states v(p). If G is a set of agents at the first order of
Definition conflicts and
n |G (p) | ≤ | G (q) |
∑ wk =1
k =1
then ∧ and ∨ logic operations satisfy the follow-
ing properties
v(p ∧ q ) = v1 (p) ∧ v1 (q ),..., vn (p) ∧ vn (q ) ,
|G(p ∧ q)| = min(|G(p)|, |G(q)|) - |G(¬ q ∧ p)|,
v(p ∨ q ) = v1 (p) ∨ v1 (q ),..., vn (p) ∨ vn (q ) |G(p ∨ q)| = max(|G(p)|, |G(q)|) + |G(¬ q ∧ p)|.
and also
where the symbols ∧, ∨, ¬ in the right side of the
equations are the classical AND, OR, and NOT |G(p ∨ q)| = |G(p) ∪G(q)|, G(p ∧ q)| = |G(p) ∩
operations. G(q)|
Below these operations are written with explicit
indication of agents (in the first row): Corollary 1 (min/max properties of ∧ and ∨
operations for nested sets of agents)
v(¬ p) = ¬v1 (p),..., ¬vn (p), If G is a set of agents at the first order of con-
flicts such that G(q) ⊂ G(p) or G(p) ⊂ G(q) then
55
Agents in Quantum and Neural Uncertainty
Figure 2. A set of total 10 agents with two different splits to G(p) and G(q) subsets (a) and (b)
G (p ∧ ¬ p) = G(p) ∩ G(¬ p) = G(p) ∩ Gc(p) = This follows from the properties of the sym-
∅ metric difference ⊕ (e.g.., Flament 1963).
Figure 2 illustrates a set of agents G(p) for
It follows from the definition of the first order which p is true and a set of agents G(q) for which
of conflict and statement 2. In other words, G (p q is true. In Figure 2(a) the number of agents
∧ ¬ p) = ∅ corresponds to the contradiction p ∧ for which truth values of p and q are different,
¬ p, that is always false and G (p ∨ ¬ p)= G cor- (¬ p ∧ q) ∨ (p ∧ ¬q), is equal to 2. These agents
responds to the tautology p ∨ ¬ p, that is always are represented by white squares. Therefore the
true in the first order conflict. distance between G(p) and G(q) is 2. Figure 2(b)
Let G1⊕G2 be a symmetric difference of sets shows other G(p) and G(q) sets with the number
of agents G1 and G2, of the agents for which ¬ p ∧ q) ∨ (p ∧ ¬q is true
G1⊕ G2 = (G1∩ G2c) ∪ (G1c ∩ G2)and let p⊕q equal to 6 (agents shown as white squares and
be the exclusive or of propositions p and q, squares with the grid). Thus, the distance between
p⊕q = (p ∧ ¬q) ∨ (¬ p ∧ q). the two sets is 6.
Consider, a set of agents G(p⊕q). It consists In Figure 2(a), set 2 consists of 2 agents |G((p
of agents for which values of p and q differ from ∧ ¬ q)| = 2 and set 4 is empty, |G((¬ p ∧ ¬ q)| =
each other, that is 0, thus D(Set2, Set4)=2. This emptiness means
G(p⊕q)= G ((p ∧ ¬q) ∨ (¬ p ∧ q)). that a set of agents with true p includes a set of
Below we use the number of agents in set agents with true q.
G(p⊕q) to define a measure of difference between In Figure 2(b), set 2 consists of 4 agents |G((p
statements p and q and a measure of difference ∧ ¬ q)| = 4 and set 4 consists of 2 agents, |G((¬ p
between sets of agents G(p) a G(q). ∧ ¬ q)| = 2, thus D(Set2,Set4)=6.
Definition. A measure of difference D(p,q) These splits produce different distances be-
between statements p and q and a measure of tween G(p) and G(q). The distance in the case (a)
difference D(G(p),G(q)) between sets of agents is equal to 2; the distance in the case (b) is equal
G(p) a G(q) are defined as follows: to 6. Set 1 (black circles) consists of agents for
D(p,q) = D(G(p),G(q))= |G(p)⊕ G(q)| which both p and q are false, Set 2 (white squares)
Statement 4. D(p,q) = D(G(p),G(q)) is a dis- consists of agents for which p is true but q is false.
tance, i.e., it satisfies distance axioms Set 3 (white circles) consists of agents for which
D(p,q) ≥ 0 p and q are true, and Set 4 (squares with grids)
D(p,q) = D(q,p) consists of agents for which p is false and q is true,
D(p,q)+D(q,h) ≥ D(p,h).
56
Agents in Quantum and Neural Uncertainty
For complex computations of logic values Respectively, the belief Bel(Ω) and plausibil-
provided by the agents we can use a graph of the ity Pl(Ω) measures for any set Ω are defined as
distances among the sentences p1, p2, ….,pN. For
example, for three sentences p1, p2, and p3, we 2N
matrix
Thus, Bel measure includes all subsets that
v(p ∧ q ) =
g1 g2 ... gn −1 gn
inside Ω, The Pl measure includes all sets with
v1(p) ∧ v1(q ) v2 (p) ∧ v2 (q ) ... vn −1(p) ∧ vn −1(q ) vn (p) ∧ vn (q )
non-empty intersection with Ω. In the evidence
g1 g2 ... gn −1 gn
v(p ∨ q ) =
theory as in the probability theory we associate
v1(p) ∨ v1(q ) v2 (p) ∨ v2 (q ) ... vn −1(p) ∨ vn −1(q ) vn (p) ∨ vn (q )
g
v(¬p) = 1
g2 ... gn −1 gn
one and only one agent with an element.
v1(¬p) v2 (¬p) ... vn −1 (¬p) vn (¬p)
Figure 4 shows set A at the border of the set Ω,
which is divided in two parts: one inside Ω and
which has a general form of a symmetric matrix,
another one outside it. For the belief measure,
we exclude the set A, thus we exclude the false
0 D 1,2 D 1,3 state (f,f) and a logical self-conflicting state (t,f).
D= D 1,2
0 D 2,3
But for the plausibility measure we accept the (t,
D 0
1,3 D 2,3 t) and the self-conflicting state (f,t). In the belief
measure, we exclude any possible self-conflicting
Having distances Dij between propositions we state, but in the plausibility measure we accept
can use them to compute complex expressions in self-conflicting states.
agents’ logic operation. For instance, using 0 ≤ There are two different criteria to compute the
D(p, q) ≤ G(p) + G(q) and belief and plausibility measures in the evidence
theory. For belief, C1 criterion is related to set Ω,
G (p ∧ q) ≡ G (p ∨ q) – G ((¬ p ∧ q) ∨ (p ∧ ¬q)) thus C1 is true for the cases inside Ω. The second
≡ G (p ∨ q) – D(p, q) criterion C2 is related to set A, thus C2 is false for
the cases inside A. Now we can put in evidence
a logically self-conflicting state (t, f), where we
eliminate A also if it is inside Ω . The same is
applied to the plausibility, where C2 is true for
cases inside A. In this situation, we accept a
57
Agents in Quantum and Neural Uncertainty
Figure 5. Example of irrational agents for the the same time. We call this locality phenomenon.
Rough sets theory In quantum mechanics, we have non-locality
phenomenon for which the same particle can be in
many different positions at the same time. This is a
clear violation of the ME principle for location. The
non-locality is essential for quantum phenomena.
Given proposition p = “The particle is in position
x in the space”, agent gi can say that p is true, but
agent gj can say that the same p is false.
Individual states of the particle include its posi-
tion, momentum and others. The complete state
of the particle is a superposition of the quantum
states of the particle wi at different positions xi
in the space, This superposition can be a global
logically self-conflicting situation (f, t), where wave function:
cases in Ω are false but cases in A are true. We
also use the logical self-conflict to study the roughs
set as shown in Figure 5.
Bel (Ω) = ∑ m(A ), k
Pl (Ω) = ∑ m(Ak )
Ak ⊆Ω Ak ∩Ω≠∅
Figure 5 shows set Ω that includes the logi-
cally self-conflicting states (f, t).
where y = w1 x 1 + w2 x 2 + ..... + wn x n
The internal and external part of Ω is defined
by a non empty frontier, where the couples [ t, t denotes the state at the position xi (D’Espagnat,
]are present and as well as the self-conflicting 1999). If we limit our consideration by an indi-
couples [ f, t ]. vidual quantum state of the particle and ignore
other states then we are in Mutual Exclusion
situation of the classical logic (the particle is at
QUANTUM MEChANICS: xi or not). In the same way in AUT, when we
SUPERPOSITION AND FIRST observe only one agent at the time the situation
ORDER CONFLICTS collapses to the classical logic that may not ad-
equately represent many-valued logic properties.
The Mutual Exclusive (ME) principle states that Having multiple states associated with the par-
an object cannot be in two different locations at ticle, we use AUT multi-valued logic as a way to
58
Agents in Quantum and Neural Uncertainty
Quantum Computer field of probability that the bit assumes the value
1 in the space time. Thus, in the classical sense
The modern concept of the quantum computer the bit is 1 in a given place and time. In quantum
is based on following statements (Abbott, Doer- mechanics, we have the distribution of the prob-
ing, Caves, Lidar, Brandt, Hamilton, Ferry, Gea- ability that a bit assumes the value 1 in the space
Banacloche, Bezrukov, & Kish, 2003; DiVincenzo, time. Therefore we have a field of uncertainty for
2000; DiVincenzo, 1995; Feynman, 1982; Jaeger, value 1 as well as a field of uncertainty for value
2006; Nielsen & Chuang, 2000; Benenti, 2004; 0. This leads to the concept of the qubit.
Stolze & Suter, 2004; Vandersypen, Yannoni, &
Chuang, 2000; Hiroshi & Masahito, 2006): 5. Any space of 2m - 1 dimensions is represented
by H and UH is the unitary transformation
of H by which computations are made in
1. Any state denoted as x i is a field of com-
quantum physics.
plex numbers on the reference space (posi-
tion and time). In our notation, the quantum
state is a column of values for different points Explanation:
(objects).
2. Any combination of states n is a product H is a matrix of qubits that can assume different
of fields. values. For simplicity we use only the numbers 1
and 0, where 1 means qubit that assume the value
Given two atoms with independent states of 1 in the all space time and the same for 0.
energy nm = n m the two atoms have the In quantum mechanics, the qubit value can
be changed only with particular transformations
state n and m that is the product of the
U or unitary transformations. Only the unitary
separate state for atom 1 and atom 2. transformations can be applied. Given the unitary
transformation
3. The space H is the space which dimension
is the number of elementary fields or states.
1
4. In quantum mechanics, we compute the
probability that the qubit takes values 1 or
0 in the all space time. such that UT U = I and
59
Agents in Quantum and Neural Uncertainty
Also
60
Agents in Quantum and Neural Uncertainty
0 0 0
Q2 (X,1 ) = = false
0 1 0
Q1 (X,Y) = = X , Q2 (X,Y ) =
1
P P P
¬X ∧ X = 1 2 3
P4
= (1, 2)(3, 4)
1 P2 P1 P4 P3
0
1 and
false = ¬X ∧ X = {(1, 2)} ∪ {(3, 4)} = {(1, 2),(3, 4)} = Universe
Y = 1 ⇒ X ⊕ Y = ¬X
61
Agents in Quantum and Neural Uncertainty
Points in 2-D Hilbert space for the four states To obtain the Boolean function NAND we
are must extend the space of the objects and introduce
a 3-D attribute space. Thus, we have the Toffoli
1 0 transformation and the permutation
y = 1 = , y = 0 =
0 1
F = { X, ¬X , ¬X ∧ X , ¬X ∨ X ) }
and
For
0 0 0 0
P P2 P3 P4 P5 P6 P7 P8
0 1 0 1
U = 1 = (7, 8)
P7
H= = P1 P2 P3 P4 P5 P6 P8
1 0 1 0
and
1 1 1 1 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
In the classical Boolean algebra, complex
0 0
expressions are obtained by composition using 0 0 1 0 0 0
U=
the elementary operations (NOT, AND, and OR). 0 0 0 0 1 0 0 0
In quantum computer, we cannot use the same 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
62
Agents in Quantum and Neural Uncertainty
and and
0 0 0 P P8
0 0 0 P2 P3 P4 P5 P6 P7
Y= 1 = (1, 2)(3, 4)
0 0 1 0 0 1 P2 P1 P4 P3 P5 P6 P7 P8
0 1 0 0 1 0 or
0 0 0 0 0 0 0 1
0 1 1 0 1 1 1 0 0 0 0 0
X = ,Y = , Z = , H = = X Y Z 1
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0
1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1
0
1 1 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0
UP= =
1 1 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0 0 1 0 1 1 0 1
Now we put
0 0 0 0 0 0 1 0 1 1 0 1 1 0
0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 0 0 1
0 0 1 0 0 0 0 0 0 1 0 0 1 0
0 0 0 1 0 0 0 0 0 1 1 0 1 1
=
Q ( X ,Y, Z) = UH=
0 0 0 0 1 0 0
0 1 0 0 1 0 0
Also
0 0 0 0 0 1 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 1 1 1 0 1 1 1
0
0 1
1 1 1 0
X=0 Y=0 Z=1 0
0 0 0 0 0 1
1
X=1 Y=0 Z=1 0
Q3 = =Y
X=0 Y=1 Z=1 1
Thus,
Z=1 1
X=1 Y=1
0 0 0
0 0 1
or
0 1 0
0 1 1
P P2 P3 P4 P5 P6 P7 P8
Q1 (X,Y,Z)= =X , Q2 (X,Y,Z)= =Y , Q3 (X,Y,Z)= = (1, 2)(5, 6)
1 0 0 X= 1
P2 P1 P3 P4 P6 P5 P7 P8
1 0 1
1 1 1
1 1 0
and
Therefore,
0
1 0 0 0 0 0 0 0 0 0 0 0 1
X=0 1 0 0 0 0 0 0 0 0 0 1 0 0 0
Y=0 Z=1 1
X=1 0 0 0 0 0 1 0
Y=0 Z=1 1 0 1 0 0 0 0 1
Q3 = = ¬(X ∧ Y ) = X NAND Y
X=0 Y=1 Z=1 1 0 0 0 1 0 0 0 0 0 1 1 0 1 1
UP= =
X=1 Y=1 Z=1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1
0 0 0 0 1 0 0 0 1 0 1 1 0 0
The permutation of (7,8) introduces a zero,
0 0 0 0 0 0 1 0 1 1 0 1 1 0
this Q3 (1, 1, 1) = 0 .
0 0 0 0 0 0 0 1 1 1 1 1 1 1
For the permutation we have
Q3 (X ,Y , 1) = ¬(X ∧ Y ) = X NAND Y
For the operation X ∧ Y, we have the permu-
tation
63
Agents in Quantum and Neural Uncertainty
P1 P2 P4 P3 P5 P6 P8 P7
X = {(1,2),(5,6)}
C
¬X = {(1,2),(5,6)} = {(3,4),(7,8)}
For the operation X ∨ Y, we have the permu-
tation
1 0 0 0 0 0 0 0 0 0 0 0 0 0
X=0 Y=0 Z=1 0
0 1 0 0 0 0 0 0 0 0 1 0 0 1
X=1
Y=0 Z=1 0 0 0 0 0 0 1 1
Q3 = = X ∧Y 0 0 1 0 0 0 1
X=0 Y=1 Z=1 0 0
0 1 0 0 0 0 0 0 1 1 0 1 0
UP= =
X=1 Y=1 Z=1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0
0 1 0
1 1 0 1
0 0 0 0 0 0 0 1 1 1 0 1 1 1
and because 0 0 0 0 0 0 1 0 1 1 1 1 1 0
P P2 P3 P4 P5 P6 P7 P8
= (1, 2)
X ∨ Y= 1
P2 P1 P3 P4 P5 P6 P7 P8 For contradiction we have
64
Agents in Quantum and Neural Uncertainty
or
1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 1 0 0 1
Thus, we have the Unitary transformation
0 0 1 0 0 0 0
0
0 0 1 0 0 1 0
0 0 0 1 0 0 0 0 0 1 1 0 1 1
UP= =
X=0 Y=0 Z=1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 1 0
0 1 0
1 1 0 1
X=1 Y=0 Z=1 0
Q3
0 0 0 0 0 0 0 1 1 1 0 1 1 1
= = ¬X ∧ X
X=0 Y=1 Z=1 0 0 0 0 0 0 0 1
0 1
1 1 1 1 0
X=1 Y=1 Z=1 0
Now having permutations and rules to compute
AND, OR and NOT we can generate any function
with two variables X and Y. For example,
and
0 1 0 0 0 0 0 0 0 0 0 0 0 1 P P P P P P P P
8
1 0 0 0 0 0 0 0 0 0 1 0 0 0 ¬X = 1 2 3 4 5 6 7
= {(3,4),((7,8)}
P1 P2 P4 P3 P5 P6 P8 P7
0 0 0 1 0 0 0 0 0 1 0 0 1 1 P P P P P P P P
8
¬Y = 1 2 3 4 5 6 7
= {(5,6),(7,8)}
0 0 1 0 0 0 0 0 0 1 1 0 1 0 P1 P2 P3 P4 P6 P5 P8 P7
UP= =
0 0 0 0 0 1 0 0 1 0 0 1 0 1 ¬X ∨ ¬Y = {(3,4),(7,8)} ∩ {(5,6)(7,8)} = {(7,8)} = ¬(X ∧ Y)
0 0 0 0 1 0 0 0 1 0 1 1 0 0
0 0 0 0 0 0 0 1 1 1 0 1 1 1
0 0 0 0 0 0 1 0 1 1 1 1 1 0
Finally in quantum computer, given set of
This is due to elementary permutation
P
P2 P3 P4 P5 P6 P7 P8
= (1, 2)(3, 4)(5, 6)(7, 8)
Γ = {(1,2), (3,4), (5,6), (7,8)}
X ∧ ¬X = 1
P2 P1 P3 P3 P6 P5 P8 P7
P P P P P P P P
Next by using the De Morgan rule for negation ¬X = 1
2 3 4 5 6 7 8
= {(3, 4), (7, 8)}
P1 P2 P4 P3 P5 P6 P8 P7
X ∧ ¬X = {(1, 2),(5, 6)} ∪ {(3, 4),(7, 8)} = {(1, 2),(3, 4),(5, 6),(7, 8)}
P P P P P P P P
we get Y = 1 2 3 4 5 6 7 8
= {((1,2),(3,4)}
P1 P2 P3 P4 P6 P5 P8 P7
X → Y = ¬X ∨ Y = {(3,4),(7,8)} ∩ {(1,2)(3,4)} = {(3,4)}
¬(X ∧ Y) =(¬X ∨ ¬Y)
65
Agents in Quantum and Neural Uncertainty
where y
1 0 0 1 1
true = , false = = where
0 1 1 0 0
qagent qagent2 .... qagentn
v(p) = 1
0 h1 h2 .... hn
If these parameters are equal to then
1 Now we use the previous definition of the
ψ = false logic in quantum mechanics, thus
where false can be spin down and true can be spin
up. Now for different particles a set of logic val- hk ∈ {(1,2),(3,4)}
ues v(p) can be defined by using quantum agents,
where
y = true , y = false
qagent qagent2 .... qagentn
v(p ∧ q) = 1
v
1 ∧ h v2 ∧ h2 .... vn ∧ hn
1
and p is a logic proposition, that is p = “ the par-
ticle is in the state
v
1 Now for
v 1 0
v(p) = 2 where n k ∈ {true, false} = ,
... 0 1
vk ∧ hk ∈ {(1,2),(5,6)} ∪ {(1,2),(3,4)} = {(1,2),(3,4),(5,6)}
n n
”. For example the particle can be in the state up
where
for the spin or the particle has the velocity v and
so on.
Definition.A qagent is an agent that performs
qagent qagent2 .... qagentn
v(p ∨ q) = 1
a quantum measure. v
1 ∨ h v2 ∨ h2 .... vn ∨ hn
1
The quantum agents or “qagent” and the
evaluation vector can be written in the language
of AUT in this usual way Finally, we have the logic vector
66
Agents in Quantum and Neural Uncertainty
qagent qagent2 qagent qagent2 qagent qagent2 qagent qagent2 qagent qagent2
1 1 1 1 1
n1 = 0 n2 = 0 n1 = 0 n2 = 0 n1 = 0 n2 = 0 n1 = 0 n2 = 0 n1 = 0 n2 = 0
m(p • q ) = p • q = w1 S1 + w 2 S 2 + ...... + wn S n
67
Agents in Quantum and Neural Uncertainty
qagent qagent2
α 0 +β 0 1 α 1 +β 0 α 1 +β 0 α 0 +β 0
n1 = 1 n2 = 0
qagent qagent2
α 0 +β 0 1 α 0 +β 1 α 0 +β 0 α 0 +β 1
n1 = 0 n2 = 1
qagent qagent2
α 0 +β 0 1 α 0 +β 0 α 0 +β 0 α 0 +β 0
n1 = 0 n2 = 0
qagent qagent2
1 True ½ True ½ True False
n1 = 0 n2 = 0
qagent qagent2
1 ½ True ½ True False False
n1 = 1 n2 = 1
qagent qagent2
1 ½ True False ½ True False
n1 = 1 n2 = 0
qagent qagent2
1 False False False False
n1 = 0 n2 = 1
Correlation (entanglement) in quantum me- than dependence. We can view the quantum cor-
chanics and second order conflict. Consider two relation as a conflicting state because we know
interacting particles as agents. These agents are from quantum mechanics that there is a correlation
interdependent. Their correlation is independent but when we try to measure the correlation, we
from any physical communication by any type of cannot check the correlation itself. The measure-
fields. We can say that this correlation is rather a ment process destroys the correlation. Thus, if the
characteristic of a logical state of the particles spin of one electron is up and the spin of another
68
Agents in Quantum and Neural Uncertainty
electron is down the first spin is changed when putation of many-valued logic operations used to
we have correlation or entanglement. It generates model uncertainty process. The traditional neural
the change of the other spin instantaneously and networks model Boolean operations and classical
this is manifested in a statistic correlation differ- logic. In a new fuzzy neural network, we combine
ent from zero. For more explanation see two logic levels (classical and fuzzy) in the same
D’Espagnat (1999). neural network as presented in Figure 8.
Figures 9-11 show that at the first level we
have the ordinary Boolean operations (AND, OR,
NEURAL IMAGE OF AUT and NOT). At the second level, the network
fuses results of different Boolean operations. As
In this section, we show the possibility for a new a result, a many value logic value is generated as
type of neural network based on the AUT. This Figure 8 shows.
type of the neural network is dedicated to com-
69
Agents in Quantum and Neural Uncertainty
Figure 10. AUT Many-valued logic operation OR in the AUT neural network
Now we show an example of many-valued In Table 5 we use the aggregation rule that can
logic operation AND with the agents and fusion generate a many-valued logic structure with the
in the neural network. Table 4 presents the indi- following three logic values:
vidual AND operation for a single agent in the nagent nagent
2 with equivalent notations
false
1
population of two agents. false
true false
Ω = true, = , false
qagent qagent2 2 2
1
n1 = 0 n2 = 0 Now, having the commutative rule we derive,
false + false
= false
2
With the fusion process in AUT we have the The previous composition rule can be written in
many-valued logic in the neural network. the simple form shown in Table 6 where different
70
Agents in Quantum and Neural Uncertainty
nagent nagent2
nagent nagent2
nagent nagent2
true false true
1 1 1
p∧q Agent Agent2 ... AgentN true true false
V (p ∨ q ) = 1
u1(p) ∨ u1(q ) u2 (p) ∨ u2 (q ) ... uN (p) ∨ uN (q )
nagent nagent2
nagent nagent2
nagent nagent2
nagent nagent2
nagent nagent2
false true true false true
1 1 1 1 1
false true true true false
nagent nagent2
nagent nagent2
nagent nagent2
nagent nagent2
nagent nagent2
false false false false false
1 1 1 1 1
false true true true false
nagent nagent2
nagent nagent2
nagent nagent nagent nagent2
nagent nagent2
false 1 2
false true false true
1 1 1 1
false false true
false false
nagent nagent2
nagent nagent nagent nagent nagent nagent2
nagent nagent
false 1 2 false 1 2 false 1 2
false false
1 1
false false
false
false false
false + false true + true true + true true + false true false + true true
= false = true = true = =
2 2 2 2 2 2 2
false + false true + false true true + false true true + false true false + false
= false = = = = false
2 2 2 2 2 2 2 2
false + false false + true true false + true true false + false false + true true
= false = = = false =
2 2 2 2 2 2 2 2
false + false false + false false + false false + false false + false
= false = false = false = false = false
2 2 2 2 2
results for the same pair of elements are located for p = ½ true and q = ½ true.
in the same cell. In this case we have no criteria to choose one or
Table 6 contains two different results for the the other. Here the operation AND is not uniquely
AND operation, defined, we have two possible results one is false
and the other is ½ true. The first one is shown in
false + false Table 7 and the second one is shown in Table 8.
= false
2 The neuron image of the previous operation
is shown in Figure 12.
71
Agents in Quantum and Neural Uncertainty
72
Agents in Quantum and Neural Uncertainty
73
Agents in Quantum and Neural Uncertainty
Benenti, G. (2004). Principles of Quantum Com- Ferber, J. (1999). Multi Agent Systems. Addison
putation and Information (Vol. 1). New Jersey: Wesley.
World Scientific.
Feynman, R. (1982). Simulating physics with
Carnap, R., & Jeffrey, R. (1971). Studies in Induc- computers. International Journal of Theoretical
tive Logics and Probability (Vol. 1, pp. 35–165). Physics, 21, 467. doi:10.1007/BF02650179
Berkeley, CA: University of California Press.
Flament, C. (1963). Applications of graphs theory
Chalkiadakis, G., & Boutilier, C. (2008). Se- to group structure. London: Prentice Hall.
quential Decision Making in Repeated Coalition
Gigerenzer, G., & Selten, R. (2002). Bounded
Formation under Uncertainty, In: Proc. of 7th Int.
Rationality. Cambridge: The MIT Press.
Conf. on Autonomous Agents and Multi-agent
Systems (AA-MAS 2008), Padgham, Parkes, Mül- Halpern, J. (2005). Reasoning about uncertainty.
ler and Parsons (eds.), May, 12-16, 2008, Estoril, MIT Press.
Portugal, http://eprints.ecs.soton.ac.uk/15174/1/
Harmanec, D., Resconi, G., Klir, G. J., & Pan,
BayesRLCF08.pdf
Y. (1995). On the computation of uncertainty
Colyvan, M. (2004). The Philosophical Signifi- measure in Dempster-Shafer theory. International
cance of Cox’s Theorem. International Journal Journal of General Systems, 25(2), 153–163.
of Approximate Reasoning, 37(1), 71–85. doi:10.1080/03081079608945140
doi:10.1016/j.ijar.2003.11.001
Hiroshi, I., & Masahito, H. (2006). Quantum
Colyvan, M. (2008). Is Probability the Only Coher- Computation and Information. Berlin: Springer.
ent Approach to Uncertainty? Risk Analysis, 28,
Hisdal, E. (1998). Logical Structures for Repre-
645–652. doi:10.1111/j.1539-6924.2008.01058.x
sentation of Knowledge and Uncertainty. Springer.
D’Espagnat, B. (1999). Conceptual Foundation
Jaeger, G. (2006). Quantum Information: An
of Quantum mechanics (2nd ed.). Perseus Books.
Overview. Berlin: Springer.
DiVincenzo, D. (1995). Quantum Computation.
Kahneman, D. (2003). Maps of Bounded Rational-
Science, 270(5234), 255–261. doi:10.1126/sci-
ity: Psychology for Behavioral Economics. The
ence.270.5234.255
American Economic Review, 93(5), 1449–1475.
DiVincenzo, D. (2000). The Physical Imple- doi:10.1257/000282803322655392
mentation of Quantum Computation. Experi-
Kovalerchuk, B. (1990). Analysis of Gaines’ logic
mental Proposals for Quantum Computation.
of uncertainty, In I.B. Turksen (Ed.), Proceedings
arXiv:quant-ph/0002077
of NAFIPS ’90 (Vol. 2, pp. 293-295).
Edmonds, B. (2002). Review of Reasoning about
Kovalerchuk, B. (1996). Context spaces as neces-
Rational Agents by Michael Wooldridge. Journal
sary frames for correct approximate reasoning.
of Artificial Societies and Social Simulation, 5(1).
International Journal of General Systems, 25(1),
Retrieved from http://jasss.soc.surrey.ac.uk/5/1/
61–80. doi:10.1080/03081079608945135
reviews/edmonds.html.
Kovalerchuk, B., & Vityaev, E. (2000). Data min-
Fagin, R., & Halpern, J. (1994). Reasoning about
ing in finance: advances in relational and hybrid
Knowledge and Probability. Journal of the ACM,
methods. Kluwer.
41(2), 340–367. doi:10.1145/174652.174658
74
Agents in Quantum and Neural Uncertainty
Montero, J., Gomez, D., & Bustine, H. (2007). On Ruspini, E. H. (1999). A new approach to clus-
the relevance of some families of fuzzy sets. Fuzzy tering. Information and Control, 15, 22–32.
Sets and Systems, 16, 2429–2442. doi:10.1016/j. doi:10.1016/S0019-9958(69)90591-9
fss.2007.04.021
Stolze, J., & Suter, D. (2004). Quantum Comput-
Nielsen, M., & Chuang, I. (2000). Quantum Com- ing. Wiley-VCH. doi:10.1002/9783527617760
putation and Quantum Information. Cambridge:
Sun, R., & Qi, D. (2001). Rationality Assumptions
Cambridge University Press.
and Optimality of Co-learning, In Design and
Priest, G., & Tanaka, K. Paraconsistent Logic. Applications of Intelligent Agents (LNCS 1881,
(2004). Stanford Encyclopedia of Philosophy. pp. 61-75). Berlin/Heidelberg: Springer.
http://plato.stanford.edu/entries/logic-paracon-
van Dinther, C. (2007). Adaptive Bidding in
sistent.
Single-Sided Auctions under Uncertainty: An
Resconi, G., & Jain, L. (2004). Intelligent agents. Agent-based Approach in Market Engineering
Springer Verlag. (Whitestein Series in Software Agent Technologies
and Autonomic Computing). Basel: Birkhäuser.
Resconi, G., Klir, G. J., Harmanec, D., & St.
Clair, U. (1996). Interpretation of various un- Vandersypen, L.M.K., Yannoni, C.S., & Chuang,
certainty theories using models of modal logic: I.L. (2000). Liquid state NMR Quantum Comput-
a summary. Fuzzy Sets and Systems, 80, 7–14. ing.
doi:10.1016/0165-0114(95)00262-6
Von-Wun Soo. (2000). Agent Negotiation under
Resconi, G., Klir, G. J., & St. Clair, U. (1992). Hier- Uncertainty and Risk In Design and Applications
archical uncertainty metatheory based upon modal of Intelligent Agents (LNCS 1881, pp. 31-45).
logic. International Journal of General Systems, Berlin/Heidelberg: Springer.
21, 23–50. doi:10.1080/03081079208945051
Wooldridge, M. (2000). Reasoning about Rational
Resconi, G., Klir, G.J., St. Clair, U., & Harmanec, Agents. Cambridge, MA: The MIT Press.
D. (1993). The integration of uncertainty theories.
Wu, W., Ekaette, E., & Far, B. H. (2003). Un-
Intern. J. Uncertainty Fuzziness knowledge-Based
certainty Management Framework for Multi-
Systems, 1, 1-18.
Agent System, Proceedings of ATS http://www.
Resconi, G., & Kovalerchuk, B. (2006). The Logic enel.ucalgary.ca/People/far/pub/papers/2003/
of Uncertainty with Irrational Agents In Proc. ATS2003-06.pdf
of JCIS-2006 Advances in Intelligent Systems
Research, Taiwan. Atlantis Press
Resconi, G., Murai, T., & Shimbo, M. (2000). KEY TERMS AND DEFINITIONS
Field Theory and Modal Logic by Semantic field
to make Uncertainty Emerge from Information. Logic of Uncertainty: A field that deals with
International Journal of General Systems, 29(5), logic aspects of uncertainty modeling.
737–782. doi:10.1080/03081070008960971 Conflicting Agents: Agents that have self-
conflict or conflict with other agents in judgment
Resconi, G., & Turksen, I. B. (2001). Canonical of truth of specific statements.
Forms of Fuzzy Truthoods by Meta-Theory Based Fuzzy Logic: A field that deals with modeling
Upon Modal Logic. Information Sciences, 131, uncertainty based on Zadeh’s fuzzy sets.
157–194. doi:10.1016/S0020-0255(00)00095-5
75
Agents in Quantum and Neural Uncertainty
76
Section 3
Bio-Inspired Agent-Based
Artificial Markets
78
Chapter 5
Bounded Rationality and
Market Micro-Behaviors:
Case Studies Based on Agent-
Based Double Auction Markets
Shu-Heng Chen
National Chengchi University, Taiwan
Ren-Jie Zeng
Taiwan Institute of Economic Research, Taiwan
Tina Yu
Memorial University of Newfoundland, Canada
Shu G. Wang
National Chengchi University, Taiwan
ABSTRACT
We investigate the dynamics of trader behaviors using an agent-based genetic programming system
to simulate double-auction markets. The objective of this study is two-fold. First, we seek to evaluate
how, if any, the difference in trader rationality/intelligence influences trading behavior. Second, besides
rationality, we also analyze how, if any, the co-evolution between two learnable traders impacts their
trading behaviors. We have found that traders with different degrees of rationality may exhibit different
behavior depending on the type of market they are in. When the market has a profit zone to explore, the
more intelligent trader demonstrates more intelligent behaviors. Also, when the market has two learnable
buyers, their co-evolution produced more profitable transactions than when there was only one learn-
able buyer in the market. We have analyzed the trading strategies and found the learning behaviors are
very similar to humans in decision-making. We plan to conduct human subject experiments to validate
these results in the near future.
DOI: 10.4018/978-1-60566-898-7.ch005
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bounded Rationality and Market Micro-Behaviors
79
Bounded Rationality and Market Micro-Behaviors
80
Bounded Rationality and Market Micro-Behaviors
within their budget constraints (i.e., traders were of the studies are not related to the co-evolution
not permitted to sell below their costs or buy above dynamics of individual behaviors. For example,
their values). The DA market with only zero- (Manson, 2006) implemented bounded rational-
intelligence traders was able to achieve almost ity using GP symbolic regression to study land
100% market efficiency. Based on the results, change in the southern Yucat́an peninsular region
Gode and Sunder argued that the rationality of of Mexico. In that work, each household decision
individual traders accounts for a relatively small maker is represented as a GP symbolic regression.
fraction of the overall market efficiency. To best represent bounded rationality in his prob-
To investigate the generality of Gode and lem domain, the author investigated 5 different GP
Sunder’s result, other researchers have conducted parameter settings: the fitness function, creation
similar experiments using zero-intelligence trad- operator, selection operator, population size and
ers in various types of DA markets. For example, the number of generations.
Cliff and Burten (1997) studied a DA market with Previously, we have used GP to implement
asymmetric supply and demand curves. They bounded rationality to study the co-evolution
found that zero-intelligence traders gave rise to dynamics of traders’ behaviors in an artificial DA
poor market efficiency. They then assigned the market (Chen, Zeng, & Yu, 2009; Chen, Zeng,
traders with the ability to use the closing price & Yu, 2009a). In that study, the market has two
in the previous auction round to determine the types of traders: GP traders who have the ability
current bid. Such traders, which they referred to to learn and improve their trading strategies and
as zero-intelligence-plus, performed better and naive (no-learning ability) truth-telling traders
improved market efficiency. Thus, individual who always present the assigned prices during
traders’ cognitive ability does impact overall an auction. To distinguish the cognitive abilities
market efficiency. of GP traders, different population sizes were
assigned to these traders.
GP to Implement Bounded The rationale of this design decision is based on
Rationality the learning from experience analogy of (Arthur,
1993). In a DA market, a trader’s strategies are
Mapping an evolutionary process to model influenced by two factors: the trader’s original
bounded rationality in human decision-making ideas of how to bid/ask, and the experiences he/
was first proposed in (Arthur, 1993). There, the she learned during the auction process. In GP
author extended the key precept of bounded ratio- learning, the population is the brain that contains
nality – limits to information and cognition – by the possible strategies to be used for the next bid/
positing that learning from experience is important ask. It is therefore reasonable to argue that a GP
in explaining sub-optimality, the creation of heu- trader with a bigger population size has a larger
ristics, and limits to information. During the learn- reservoir to store and process new strategies, and
ing process, individuals improve their strategies hence is more intelligent.
through a Darwin process of learning-by-doing, We have designed two controlled settings to
that balance the path-dependent exploration of new conduct the experiments. In the first setting, there
strategies by extending current strategies versus was only one GP buyer among a group of truth-
simply exploiting existing strategies (Palmer, telling traders. The experimental results show that
Arthur, Holland, LeBaron, & Tayler, 1994). when assigned with a larger population size, the GP
Genetic Programming (GP) (Koza, 1992) is buyer was able to evolve a higher-profit strategy,
one evolutionary system that has been used to which did not exist when the population size was
implement bounded rationality. However, most smaller. Meanwhile, this higher-profit strategy has
81
Bounded Rationality and Market Micro-Behaviors
82
Bounded Rationality and Market Micro-Behaviors
Figure 3. Market I Demand and Supply Curves Figure 4. Market II Demand and Supply Curves
83
Bounded Rationality and Market Micro-Behaviors
Each DA market simulation is carried out with tire day to decide the bidding prices. The strategy
a fixed number of GP generations (g), where each might be to pass the round without giving a bid.
generation lasts n (n= 2 × pop_size) days. On each By contrast, a truth-telling trader never passes an
day, 4 new tokens are assigned to each of the auction. A truth-telling buyer bids with the highest
buyers and sellers. The 8 traders then start the value of the tokens it owns while a truth-telling
auction rounds to trade the 16 tokens. A buyer seller asks for the lowest value of the token it has.
will start from the one with the highest price and The same 8 strategies will play for the day’s 25
then move to the lower priced ones while a seller auction rounds, during which a GP trader may give
will start from the one with the lowest price and a different bidding price if the auction strategy
then move to the higher priced ones. The day ends uses information from the previous round/day.
when either all 16 tokens have been successfully The truth-teller, however, will always present the
traded or the maximum number of 25 auction same bid/ask through out the 25 rounds.
rounds is reached. Any un-traded tokens (due to In each auction round, after all 8 traders have
no matching price) will be cleared at the end of presented their prices, the highest bid and the
each day. The following day will start with a new lowest ask will be selected. If there are multiple
set of 16 tokens. buyers giving the same highest bid or multiple
On each day, a GP buyer will randomly select sellers giving the same lowest ask, one of them
one strategy from its population and use it the en- will be selected based on their order, i.e. buyer
84
Bounded Rationality and Market Micro-Behaviors
85
Bounded Rationality and Market Micro-Behaviors
Table 6. GP Parameters
the final price is the average of the winning bid To conduct our analysis, we collected all
and ask, the buyer might still make a profit from evolved strategies and their daily profit (F) gen-
the transaction. However, such a risk-taking ap- erated during the last 10 generations of each run.
proach has been shown to make the market un- We consider these strategies to be more “mature”,
stable and to reduce the market efficiency (Chen, and hence to better represent the GP buyers’ trad-
S.-H., & Tai, C.-C., 2003). We therefore enforce ing patterns.
the following rule on the price generated by a GP When the population size is 10, each genera-
evolved strategy: tion is 2 × 10 = 20 days long. On each day, one
if Bid > 2 ×HTV then Bid=HTV strategy is picked randomly from the population to
This rule protects the market from becoming conduct the auction. The total number of strategies
too volatile and also allows GP to evolve rules that used during the last 10 generations is therefore
take on a small amount of risk to generate a profit. 20 × 10 = 200. Since we made 90 runs for this
Table 6 gives the GP parameter values used setup, the number of strategies used to conduct
to perform simulation runs. With 2 different our analysis is 200 × 90 = 18,000.
population sizes (110, 50) and 2 different ways When the population size is 50, each genera-
to assign GP buyers, the total number of setups is tion is 2 × 50 = 100 days long. The total number
4. For each setup, we made 90 runs. The number of auction days (also the number of strategies
of simulation runs made for each market is 360. picked to conduct the auction) during the last
With 3 market types, the total number of simula- 10 generations for all 90 runs is 100 × 10 × 90
tion runs is 1,080. = 90,000. The following subsections present our
analysis of these GP evolved strategies under three
different markets.
RESULTS AND ANALYSIS
Market I
For each market type, we analyze two scenarios:
one GP buyer in the market and two GP buyers One GP Buyer in the Market
in the market. In the first case, the focus is on
how GP population size influences the learning When there is one GP buyer (with population size
of strategies. In the second case, besides popula- 10) in this market, the daily profit (F) generated
tion size, we also investigate the co-evolution by the 18,000 strategies is between -41 and 3.5
dynamics of two GP buyers and its impact on the (see Table 7). Among them, more than 95% of
learning of strategies. the strategies give a profit that is greater than
2, which is better than that produced by a naive
86
Bounded Rationality and Market Micro-Behaviors
truth-teller (see Section 4). This indicates that 5), the GP buyer won the 4th auction round and
the GP buyer is more “intelligent” than the naive performed the transaction using the average of
truth- telling buyers. the highest bid (76) and the lowest ask (75), which
The strategies that generate a daily profit of was 75.5. The token value that the GP buyer was
3.5 can be divided into two categories: NTV (the purchasing was 79. So, the profit of this transac-
second highest token value) and those with length tion for the GP buyer is 79 − 75.5 = 3.5. In a
greater than or equal to 2. As shown in Table 8, market where all buyers have the same token
NTV was used to conduct more than 83% of the values, this “waiting after all other buyers have
auction, and we therefore decided to study how purchased their tokens before winning the auction”
it generated the higher profit. is a more profitable strategy.
This strategy is actually quite smart: it bids Did the more “intelligent” (population size 50)
with the second highest token value when all GP buyer devise a better strategy? We examined
other truth-telling buyers bid the highest token all 90,000 strategies but did not find one. Table
price. During the first 3 auction rounds when at 9 shows that the more “intelligent” GP buyer
lease one truth-telling buyer bid the highest token used profit -3.5 strategies slightly more often to
value of 79, the GP buyer, who bid the second conduct the auction (93% vs. 92%). Other than
highest token value of 76, could not win the auc- that, there was no significant difference between
tion. However after the 3 truth-telling buyers the behaviors of the GP buyers with population
purchased their first tokens and each earned a sizes of 10 and 50. This suggests that in a stable
profit of 2, they moved to bid with the next high- (all other traders are truth-tellers) market where
est token value of 76. Since buyer 1, the GP all buyers have the same token values and all sell-
buyer, was preferred when there were multiple ers have the same token values, a small degree
buyers giving the same highest bid (see Section of intelligence is sufficient to devise the optimal
87
Bounded Rationality and Market Micro-Behaviors
strategy (the one that generates daily profit of 3.5 purchasing the second token whose value is 76,
is the optimal one in this market). Any increase the profit for this transaction is 76 − 75.5 = 0.5.
in the traders’ intelligence/rationality has no After that, GP buyer 1 did not make any profitable
significant impact on their behaviors. In other transaction and its total daily profit is 4.
words, the relationship between intelligence and The second buyer, who also has GP learning
performance is not visible. ability, only gets to win the auction in round 5
when the first GP buyer has purchased two tokens.
Two GP Buyers in the Market In round 5, GP buyer 1 bids its next highest token
value of 75 (see Table 1) and all other buyers bid
When both buyers 1 and 2 are equipped with GP 76. Buyer 2, a GP buyer, is selected over buyer 3
learning ability, the trading behaviors become and 4 to carry out the transaction using the price
more complicated. Table 10 gives information (76 + 76)/2 = 76 (note that all 4 sellers are trading
about the 2 most used strategies by the 2 GP buy- their second lowest token with a value of 76 as
ers under population sizes of 10 and 50. each has sold its 75 token during the first 4 auc-
It appeared that both GP buyers learned the tion rounds). Since GP buyer 2 is purchasing its
NTV strategy. When they used this strategy to bid first token with value 79, the profit gained in this
against each other, GP buyer 1 earned a daily transaction is 79 − 76 = 3. After that, no market
profit of 4 while GP buyer 2 earned a daily transactions are profitable due to the increase
profit of 3. How did this happen? in seller token prices and the decrease in buyer
We traced the market daily transactions and token prices. The second GP buyer earned a total
found that the bias in the market setup gives GP daily profit of 3.
buyer 1 an advantage over GP buyer 2 who also When the population size of both GP buyers
has an advantage over buyers 3 & 4. During the is increased to 50, Table 10 shows that there is
first 2 auction rounds, each of the two truth-telling no significant difference in their behaviors. This
buyers (who bid 79) won one auction round and might also be due to the market type, as explained
made a profit of 2 by carrying out the transaction previously.
using a price of (79 + 75)/2 = 77. In round 3, all
buyers bid the second highest token value of 76. Market II
However, buyer 1, a GP buyer, is selected, based
on the market setup, to carry out the transaction One GP Buyer in the Market
using the price of (76 + 75)/2 = 75.5. The profit
earned by buyer 1 is therefore 79 − 75.5 = 3.5. In In Market II, the supply and demand curves are
the next auction round, all buyers bid 76 again and almost parallel with each other. A naive truth-
buyer 1 is again selected to carry out the transac- telling strategy would trade all 16 tokens suc-
tion using the price of 75.5. Since GP buyer 1 is cessfully. What kind of strategies would the GP
88
Bounded Rationality and Market Micro-Behaviors
buyers evolve under this market environment? the buyer. However, where can a buyer find the
We examined all 18,000 strategies that were sellers’ token values? One source is the sellers’
evolved by the GP buyer with population size 10 asking prices in the auction that took place on
and found that they had 115 different daily profit the previous day (PMaxAsk) or in the previous
values. Among them, the highest profit is 69,705, auction round (CASK). Another source is the
which is much higher than the profit earned by transaction prices of the auction that took place on
the truth-telling strategy (34,863.5). Since strate- the previous day (PMin) or in the previous auction
gies with a profit of 69,705 were used the most round. The GP buyer has learned that knowledge
(68%) during the auction, we decided to analyze and has been able to use the acquired information
how they earned the higher profit. to make the lowest possible bid. Consequently,
Table 11 gives the strategies that produce a its profit is way above the profit made using the
daily profit of 69,705. Among them, the 3 mostly truth-telling strategy.
used strategies are: Did the “smarter” GP buyer (with population
size 50) devise a more profitable strategy for this
• CASK: the lowest asking price in the pre- type of market? We examined all 90,000 strate-
vious auction; gies but did not find one. Table 12 shows that
• PMaxAsk: the highest asking price on the the best strategies are still those that give a daily
previous day; profit of 69,705. However, the “smarter” GP buyer
• PMin: the lowest transaction price on the has used this type of higher-profit strategy more
previous day; frequently (86% vs. 68%) to conduct an auction.
This indicates that in a stable (all other traders are
One common feature of these 3 strategies is truth-tellers) market and the supply and demand
that they all used information from the previous quantities are the same, more intelligent GP buy-
transactions (either on the previous day or in the ers exhibit more intelligent behavior by using the
last auction) to decide the current bidding price. higher-profit strategies more frequently.
This is actually a very wise strategy in this type
of market where the quantities of supply and Two GP Buyers in the Market
demand are equal (16). Under such conditions, a
buyer can bid any price to win 4 auction rounds When both buyers 1 and 2 are equipped with
as long as the price is above the sellers’ token GP learning ability, the market becomes more
values. So, the closer the bidding price is to the competitive as both of them devise strategies to
sellers’ token value, the higher the profit is for outdo each other to make a profit. Under such a
89
Bounded Rationality and Market Micro-Behaviors
competitive environment, we found that both GP tion information, such as CASK, PMaxAsk and
buyers evolved more profitable strategies than PMin, to make more profitable bids. Depending
were evolved when the market only had one GP on which of these strategies were used against
buyer. Table 13 gives the strategies evolved by the each other, GP buyers 1 and 2 each earned a dif-
two GP buyers with population sizes 10 and 50. ferent amount of profit. If both GP buyers used
Although both GP buyers evolved higher- CASK, they would give the same bids. Under the
profit strategies, GP buyer 1 evolved one group market mechanism where buyer 1 was preferred
of strategies that GP buyer 2 did not evolve: the over buyer 2, GP buyer 1 won the bid and earned
group that gives the highest profit of 69,715. the higher profit (69,715 vs. 69,705). However, if
Moreover, GP buyer 1 applied the two higher one GP buyer used PMaxAsk and the other used
profit strategies (with profits 69,710 and 69,715) CASK, the one that used PMaxAsk would earn
more frequently than GP buyer 2 did. This sug- more profit (69,710 vs. 69,705). This is because
gests that GP buyer 1 won the competition during PMaxAsk bid the highest token price of the
the co-evolution of bidding strategies in this sellers (42) while CASK bid the lowest asking
market. Was this really the case? price of the sellers in the previous auction round,
We examined all strategies and found that which can be 33, 34, 39, 40 or 42 (see Table 2).
both GP buyers have learned to use past transac- Consequently, PMaxAsk won the auction and
90
Bounded Rationality and Market Micro-Behaviors
earned the higher profit. Table 13 shows that GP complex structures, such as Max CASK CASK
buyer 1 used the profit 69,715 strategies most and If bigger then else CASK CASK CASK
frequently. This indicates that GP buyer 1 won CASK. This is an understandable behavior change
the competition due to the advantage it received since a larger population size gives GP more room
from the market setup bias. to maintain the same profit strategies with more
When the population size of the two GP buy- diversified structures.
ers was increased to 50, the market setup bias no
longer dominated the market dynamics. Instead, Two GP Buyers in the Market
GP buyer 2 started to use PMaxAsk more often
against GP buyer 1’s CASK and earned the higher Similar to the strategies in Market II, when both
profit. As a result, the frequency with which GP buyers 1 and 2 were equipped with GP learning
buyer 1 earned profit of 69,715 was reduced (30% ability, they both evolved strategies that earned
to 18%). Again, more intelligent GP buyers ex- more profit than that evolved when there was
hibited different behavior under the co-evolution only 1 GP buyer in the market. Another similar-
setting in this market. ity is that GP buyer 1 evolved the strategies that
earned the highest profit of 17,765, which GP
Market III buyer 2 did not evolve. Table 15 gives the most
used strategies by the 2 GP buyers with population
One GP Buyer in the Market sizes of 10 and 50.
The strategies also have similar dynamics to
Market III is similar to Market II in that the that in Market II. When both GP buyers used
quantities of supply and demand are equal (16). CASK, GP buyer 1 had an advantage and earned
Did the GP buyer learn to use information from 17,765 while GP buyer 2 earned 15,975. When
the previous auction to obtain the sellers’ token one GP buyer used CASK and the other used
price and make the lowest bid to earn the most PMaxAsk, the one that used PMaxAsk earned a
profit? Table 14 gives the most used strategies higher profit (16,863,5 vs. 15,978).
by the GP buyer with population size 10 and However, in this market, GP buyer 1 only used
50. It is clear that the GP buyer has learned that the strategies that earned 17,765 in 15% of the
knowledge. The most frequently used strategy auction when both GP buyers had a population size
is CASK, which earned a daily profit of 15,978. of 10. This indicates that the market mechanism
This profit is much higher than the profit earned bias could not make buyer 1 win the competitive
by the truth-telling strategy (11,947.5). co-evolution in this market. Other factors, such
The more intelligent GP buyer (who had a as the supply and demand prices of the 16 tokens,
population size of 50) developed a similar style also influenced the co-evolution dynamics.
of strategies that gave a profit of 15,978. How- Another type of GP buyers’ behavior, which
ever, a small number of these strategy had more was different from that in Market II was that when
91
Bounded Rationality and Market Micro-Behaviors
the population size was increased to 50, both GP dynamics in the devised artificial DA market
buyers increased the usage of the higher-profit resembles the dynamics of real markets. We will
strategies. In other words, more intelligent GP buy- continue to investigate the market dynamics when
ers learned to co-operate with each other, instead both buyers and sellers have GP learning ability.
of competing with each other, which seems to be Our analysis of the GP-evolved strategies
the behavior of the two GP buyers in Market II. shows that individual GP buyers with different
More intelligent GP buyers also exhibited different degrees of rationality may exhibit different be-
behaviors in this market. havior depending on the type of market they are
in. In Market I where all buyers have the same
token values and all sellers have the same token
CONCLUDING REMARKS values, the behavioral difference is not significant.
However, in Markets II & III where the supply
In all three markets we have studied, the co- and demand prices have room to exploit a higher
evolution of two self-interested GP buyers has profit, more intelligent GP buyers exhibit more
produced more profitable transactions than when intelligent behavior, such as using higher-profit
there was only one GP buyer in the market. This strategies more frequently or cooperating with
phenomenon was also observed in our previous each other to earn more profits. In (Chen, Zeng, &
work (Chen, Zeng, & Yu, 2009; Chen, Zeng, & Yu, 2009; Chen, Zeng, & Yu, 2009a), a similar GP
Yu, 2009a): the overall buyer profit increases as buyer behavioral difference in the market studied
the number of GP buyer increases in the market was reported. This suggests that the intelligent
studied. In other words, an individual pursuing his behavior of a GP trader becomes visible when the
own self-interest also promotes the good of his market has a profit zone to explore.
community as a whole. Such behavior is similar to All of the observed individual traders’ learning
that of humans in real markets as demonstrated by behaviors make intuitive sense. Under the devised
Adam Smith. Although we have only studied the artificial DA market platform, GP agents demon-
case where only buyers have GP learning ability, strate human-like rationality in decision-making.
this result suggests that to some degree, the GP We plan to conduct human subject experiments to
trader agents have similar qualities to humans in validate these results in the near future.
decision-making. Meanwhile, the co-evolution
92
Bounded Rationality and Market Micro-Behaviors
Cliff, D., & Bruten, J. (1997). Zero is not enough: Smith, V. (1976). Experimental economics:
On the lower limit of agent intelligence for con- induced value theory. The American Economic
tinuous double auction markets (Technical Report Review, 66(2), 274–279.
HP-97-141). HP Technical Report.
Edmonds, B. (1998). Modelling socially intel-
ligent agents. Applied Artificial Intelligence, 12, KEY TERMS AND DEFINITIONS
677–699. doi:10.1080/088395198117587
Agent-based Modeling: A class of compu-
Gode, D. K., & Sunder, S. (1993). Allocative tational models for simulating the actions and
efficiency of markets with zero-intelligence trad- interactions of autonomous agents (both individual
ers: markets as a partial substitute for individual or collective entities such as organizations or
rationality. The Journal of Political Economy, groups) with a view to assessing their effects on
101, 119–137. doi:10.1086/261868 the system as a whole.
93
Bounded Rationality and Market Micro-Behaviors
94
95
Chapter 6
Social Simulation with
Both Human Agents and
Software Agents:
An Investigation into the
Impact of Cognitive Capacity
on Their Learning Behavior
Shu-Heng Chen
National Chengchi University, Taiwan
Chung-Ching Tai
Tunghai University, Taiwan
Tzai-Der Wang
Cheng Shiu University, Taiwan
Shu G. Wang
Chengchi University, Taiwan
ABSTRACT
In this chapter, we will present agent-based simulations as well as human experiments in double auc-
tion markets. Our idea is to investigate the learning capabilities of human traders by studying learning
agents constructed by Genetic Programming (GP), and the latter can further serve as a design platform
in conducting human experiments. By manipulating the population size of GP traders, we attempt to
characterize the innate heterogeneity in human being’s intellectual abilities. We find that GP traders
are efficient in the sense that they can beat other trading strategies even with very limited learning ca-
pacity. A series of human experiments and multi-agent simulations are conducted and compared for an
examination at the end of this chapter.
DOI: 10.4018/978-1-60566-898-7.ch006
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Social Simulation with Both Human Agents and Software Agents
96
Social Simulation with Both Human Agents and Software Agents
agents with Genetic Programming (GP) and chapter or provided the foundation of our research
then we manipulate their “cognitive capacity” or method. First, we will go through several double
“computational capacity” by assigning GP traders auction experiments to support our research ques-
with populations of different sizes. tion. Second, we will conduct a brief survey of
Unlike common practices, we do not construct how cognitive capacity is found to be decisive
our agents based on the results of human experi- in human decision making. In the end, we will
ments because human factors are so difficult to talk about how to model cognitive capacity in
control and observe, and therefore it might not be agent-based models.
easy to elicit definitive conclusions from human
experiments. Instead, we adopted the Herbert Trading Tournaments in
Simon way of studying human behavior—un- Double Auction Markets
derstanding human decision processes by con-
ducting computer simulations. Thus agent-based The pioneering work in exploring individual char-
simulations in this chapter are not only research acteristics of effective trading strategies in double
instruments used to test our conjecture, but they auction markets consists of Rust, Miller, & Palmer
also serve as the design platform for human ex- (1993, 1994)’s tournaments held in the Santa Fe
periments. Institute. Rust, Miller, & Palmer (1993, 1994)
We first model learning agents with Genetic collected 30 trading algorithms and categorized
Programming, and the population sizes of GP trad- them according to whether they were simple or
ers are regarded as their cognitive capacity. These complex, adaptive or non-adaptive, predictive or
GP traders are sent to the double auction markets non-predictive, stochastic or non-stochastic, and
to compete with other designed strategies. With the optimizing or non-optimizing.
discovery of the capability of GP learning agents, Rust, Miller, & Palmer (1993, 1994) con-
we further conduct human experiments where ducted the double auction tournament in a very
human traders encounter the same opponents as systematic way. They proposed a random token
GP agents did. By comparing the behavior and generation process to produce the demand and
learning process of human traders with those of supply schedules needed in their tournaments. A
GP agents, we have a chance to form a better large amount of simulations which cover various
understanding of human learning processes. kinds of market structures were performed, and
This chapter is organized as follows: Section 2 an overall evaluation was made to distinguish
will introduce related research to supply the back- effective strategies from poor ones.
ground knowledge needed for this study. Section The result was rather surprising: the win-
3 depicts the experimental design, including the ning strategy was simple, non-stochastic, non-
trading mechanism, software trading strategies, predictive, non-optimizing, and most importantly
the design of GP learning agents, and experimental non-adaptive. In spite of this, other strategies
settings. The results, evaluations, and analysis of possessing the same characteristics still performed
the experiments are presented in Section 4. Section poorly. As a result, it remains an open question
5 provides the concluding remarks. “whether other approaches from the literature on
artificial intelligence might be sufficiently power-
ful to discover effective trading strategies.” (Rust,
LITERATURE REVIEW Miller, & Palmer, 1994, pp. 94–95)
It is important to note that there are certain
In this section, we will present a series of related sophisticated strategies in Rust, Miller, & Palmer
studies, which inspired the research behind this (1993, 1994)’s tournaments, and some of them
97
Social Simulation with Both Human Agents and Software Agents
even make use of an artificial intelligence algo- moto (2004) trained their human subjects with
rithm as the learning scheme. Compared to simple knowledge about the futures and stock markets,
strategies, such learning agents did not succeed in related technical and fundamental trading strate-
improving their performance within a reasonable gies, and the operations of trading interface for
period of time. Therefore Rust, Miller, & Palmer 90 minutes. In addition to this arrangement, the
(1994) deemed that humans may perform better trading mechanism of U-Mart, named “Itayose,”
because they can generate good strategies based is special in that the market matches the outstand-
on very limited trading experiences. ing orders every 10 seconds. As a result, human
The comparisons of learning agents versus de- traders have more time to contemplate and make
signed strategies have assumed a different form in their bids or offers. Both of the above designs
a series of human-agent interaction studies. Three enable human traders to have more advantages
projects will be introduced here, including Das, to compete with the randomly bidding software
Hanson, Kephart, & Tesauro (2001), Taniguchi, strategies.
Nakajima, & Hashimoto (2004), and Grossklags However, the results show that human traders
& Schmidt (2006). have poorer performance than software agents,
Das, Hanson, Kephart, & Tesauro (2001) em- although there is a human trader who learns to
ployed a continuous double auction market as the speculate and can defeat software strategies. In
platform and had human traders compete with two spite of their results, Taniguchi, Nakajima, &
software trading strategies—ZIP and GD. ZIP is Hashimoto (2004)’s experiments still exhibit the
an adaptive strategy proposed by Cliff & Bruten possibility of defeating software strategies with
(1997), and a GD strategy is proposed by Gjerstad human intelligence.
& Dickhaut (1998). In order to investigate the Unlike previous studies, Grossklags & Schmidt
potential advantage of software strategies due (2006)’s research question is more distinguish-
to their speed, Das, Hanson, Kephart, & Tesauro ing: they want to know whether human traders
(2001) distinguished fast agents from slow agents will behave differently when they know there
by letting slow agents ‘sleep’ for a longer time. are software agents in the same market. In their
Human traders encounter three kinds of oppo- futures markets, they devised a software agent
nents, namely GD Fast, ZIP Fast, and ZIP Slow called “Arbitrageur.” Arbitrageur’s trading strat-
opponents in Das, Hanson, Kephart, & Tesauro egy is simple: sell bundles of contracts when their
(2001)’s experiments. prices are above the reasonable price, and buy
The results show that regardless of whether bundles of contracts when their prices are below
the agents are fast or slow, they all surpass hu- the reasonable price. This is a very simple strategy
man traders and keep a very good lead. Although which human traders may also adopt. However,
human traders seem to improve over time, they software agents make positive profits in 11 out
still cannot compete with software strategies at of the total of 12 experiments. Because this is
the end of the experiments. a zero-sum game, the software agents’ positive
The superiority of software strategies is further performance means that losses are incurred by
supported by Taniguchi, Nakajima, & Hashimoto human traders, although the differences are not
(2004)’s futures market experiments. Taniguchi, statistically significant.
Nakajima, & Hashimoto (2004) use the U-Mart Similar to Rust, Miller, & Palmer (1993,
futures market as the experimental platform where 1994)’s results, Grossklags & Schmidt (2006)’s
human traders and random bidding agents compete results, together with Das, Hanson, Kephart,
to buy or sell contracts at the same time. Before & Tesauro (2001)’s and Taniguchi, Nakajima,
the experiments, Taniguchi, Nakajima, & Hashi- & Hashimoto (2004)’s findings, demonstrate a
98
Social Simulation with Both Human Agents and Software Agents
general picture in which it is difficult for human humans are boundedly rational, they cannot find
traders to compete with software agents even if the optimal solutions at the beginning and have
the software agents are very simple. Learning to improve their performance based on their
agents (either software ones or humans) can experiences.
hardly defeat designed strategies in a short period Information and cognitive capacity are two
of time. Nevertheless, we can also observe from important sources of bounded rationality to human
these experiments that learning agents (either decision makers. While economists, either theo-
software or human) may have the chance to defeat rists or experimentalist, have mainly emphasized
software strategies if they have enough time to the importance of information, the significance of
learn. Then some questions naturally emerge: in cognitive capacity has been temporarily mislaid
what situations can learning agents outperform but has started to regain its position in economics
other software strategies? Is there any mechanism experiments in recent years.
which has an influence on learning agents’ learn- Some of the earliest experimental ideas con-
ing behavior? cerning cognitive capacity came from Herbert
To answer the first question, we need to conduct Simon, who was the initiator of bounded rational-
experiments where learning agents can exert all ity and was awarded the Nobel Memorial Prize
their potential to win the game. By considering in Economics. In the “concept formation” experi-
the cost of testing human traders in various situ- ment (Gregg & Simon, 1979) and the arithmetic
ations with long time horizons, we adopt another problem (Simon, 1981), Simon pointed out that
approach: we conduct agent-based simulation the problem is strenuous or even difficult to solve,
with learning agents to examine their winning not because human subjects did not know how to
conditions first, and then we run human experi- solve the problem, but mainly because without
ments to see how things develop when learning decision supports such as paper and pencil, such
software traders are replaced by learning humans. tasks can easily overload human subjects’ working
In selecting an appropriate algorithm to model our memory capacity and influence their performance
learning agent, we choose Genetic Programming (Simon, 1981, 1996).
because as Chen, Zeng, & Yu (2008)’s research In the realm of psychology, Payne, Bettman, &
shows, GP traders can evolve and adapt very ef- Johnson (1993)’s research can be a good founda-
ficiently in a double auction market. tion for our research. Payne, Bettman, & Johnson
(1993) pointed out that humans have different
Cognitive Ability and strategies for solving a specific problem, and
Learning Behavior humans will choose the strategy by considering
both accuracy and the cognitive effort they are
The answer to the second question raised at the going to make. In the end, it is dependent on the
end of the last section is not so obvious. To find cognitive capacity of human decision makers.
possible factors influencing people’s learning In addition to the above conjectures and
behavior, we have to consult science disciplines theories, more concrete evidence is observed
which have paid much attention to this issue. in economic laboratories. Devetag & Warglien
Fortunately, this question has been investigated (2003) found a significant and positive correlation
by psychologists and decision scientists for a between subjects’ short-term memory scores and
long time. conformity to standard game-theoretic prescrip-
To look into possible factors of learning, we tions in the games. Benjamin, Brown, & Shapiro
have to realize that the reason why people have (2006) imposed a cognitive load manipulation by
to learn lies in their bounded rationality. Because asking subjects to remember a seven-digit number
99
Social Simulation with Both Human Agents and Software Agents
while performing the task. The results showed population sizes. The bigger the population a
that the cognitive load manipulation caused a GP trader has, the more capable it is of handling
statistically significant increase in one of two various concepts and structures to form its trad-
measures of small-stakes risk aversion. Devetag ing strategies.
& Warglien (2008) pointed out that subjects
construct representations of games of different
relational complexity and will play the games ExPERIMENTAL DESIGN
according to these representations. Their experi-
mental results showed that both the differences In this chapter, we will report two kinds of experi-
in the ability to correctly represent the games and mental results. One is from agent-based simula-
the heterogeneity of the depth of iterated thinking tions, and the other is from human experiments.
in games appear to be correlated with short-term The idea of agent-based simulation in this
memory capacity. chapter is to understand human dynamics with
Consequently, we choose to have working the tool of GP agents. The purpose of such
memory capacity as the representative of cogni- simulations is two-fold. First, we let GP agents
tive capacity in this study. In order to obtain this compete with other software trading strategies to
inherent variable, we will give several tests to see the potential of learning agents, and observe
our subjects to measure their working memory the conditions when learning agents can defeat
capacity, apart from manipulating it by imposing other designed strategies. Second, we can test
cognitive loading tasks. the influences of cognitive capacity by imposing
different population sizes on our GP learning
Cognitive Ability in Agent- agents. Such manipulation of cognitive capacity
based Models is almost impossible with human subjects, and
thus it will be very informative if we can have
We have seen from the last section that cognitive simulated results before we eventually perform
capacity, or more specifically, working memory, human experiments.
could be an important factor that has an influence A human experiment is conducted after the
on learning capability. Nevertheless, it is rarely simulations. The results of human experiments
mentioned in agent-based economic models. To will be compared with simulation results to verify
the best of our knowledge, the only exception is whether we have found an adequate way to model
Casari (2004)’s model about adaptive learning human learning processes. Parameters and the
agents with limited working memory. Casari design of both experiments will be presented in
(2004) used Genetic Algorithms (GA) as agents’ this section.
learning algorithms, and made use of the size of
each agent’s strategy set. The results show that Market Mechanism
the model replicates most of the patterns found in
common property resource experiments. Experiments in this chapter were conducted on
Being inspired by different psychological a AIE-DA (Artificial Intelligence in Economics-
studies, we adopt a similar mechanism to model Double Auction) platform which is an agent-based
learning agents’ cognitive capacity in this research. discrete double auction simulator with built-in
We employ GP as traders’ learning algorithms so software agents.
that they can construct new strategies or modify AIE-DA is inspired by the Santa Fe double
old ones based on past experiences. The limits of auction tournament held in 1990, and in this study
working memory are concretized as GP traders’ we adopted the same token generation process as
100
Social Simulation with Both Human Agents and Software Agents
in Rust, Miller, & Palmer (1993, 1994)’s design. and let others negotiate, and then steal
Our experimental markets consist of four buy- the deal when the bids and asks got close
ers and four sellers. Each of the traders can be enough. In our simulations, we modified
assigned a specific strategy–either a designed these three strategies so that they became
trading strategy or a GP agent. more conservative in their bids and offers:
During the transactions, traders’ identities are when they are going to send their orders
fixed so they cannot switch between buyers and to the market, they will choose a number
sellers. Each trader has four units of commodities based on their next token values instead of
to buy or to sell, and can submit only once for current ones, which means their bids and
one unit of commodity at each step in a trading offers are less competitive but are more
day. Every simulation lasts 7,000 trading days, profitable if they succeed in trading.
and each trading day consists of 25 trading steps. • ZIC (Zero-Intelligence Constrained) from
AIE-DA is a discrete double auction market and Gode & Sunder (1993): ZIC traders send
adopts AURORA trading rules such that at most random bids or asks to the market in a
one pair of traders is allowed to make a transac- range bounded by their reservation prices;
tion at each trading step. The transaction price is hence they can avoid transactions which
set to be the average of the winning buyer’s bid incur losses.
and the winning seller’s ask. • ZIP (Zero-Intelligence Plus) from Cliff &
At the beginning of each simulation, each trader Bruten (1997): A ZIP trader forms bids or
will be randomly assigned a trading strategy or as asks by a chosen profit margin, and tries
a GP agent. Traders’ tokens (reservation prices) to choose a reasonable profit margin by
are also randomly generated with random seed inspecting its status, the latest shout price,
6453. Therefore, each simulation starts with a and whether the shouted prices are accept-
new combination of traders and a new demand ed or not.
and supply schedule. • Markup from Zhan & Friedman (2007):
Markup traders setup certain markup rates
Software Strategies and consequently determine their shouted
prices. In this chapter, the markup rate was
In order to test the ability of GP agents, we pro- set to be 0.1. We choose 0.1 because Zhan
grammed several trading strategies from the dou- and Friedman’s simulations show that the
ble auction literature as GP agents’ competitors: market efficiency will be maximized when
traders all have 0.1 markup rates.
• Truth Teller: Truth-telling traders who • Gjerstad-Dickhaut (GD) from Gjerstad &
simply use their reservation prices as their Dickhaut (1998): A GD trader scrutinizes
bids or asks. the market history and calculates the possi-
• Kaplan, Ringuette, and Skeleton from bility of successfully making a transaction
Rust, Miller, & Palmer (1993, 1994)’s with a specific shouted price by counting
tournament: Skeleton is the strategy sup- frequencies of past events. After that, the
plied to all entrants in their competition, trader simple chooses a price as her bid/ask
and it makes safe bids/asks according to if it maximizes her expected profits.
current bids or asks in the market. Kaplan • BGAN (Bayesian Game Against Nature)
and Ringuette were the best and second- from Friedman (1991): BGAN traders treat
best traders, respectively, their trading phi- the double auction environment as a game
losophy being to wait in the background against nature. They form beliefs in other
101
Social Simulation with Both Human Agents and Software Agents
traders’ bids or asks distribution and then as traders’ own reservation prices, current market
compute the expected profit based on their shouts, and average price in the last period, etc.
own reservation prices. Hence their bids/ We adopt the same design of genetic operations
asks simply equal their reservation prices as well as terminal and function sets of GP trad-
minus/plus the expected profit. Finally, ers as Chen, Chie, & Tai (2001) describes, apart
BGAN traders employ Bayesian updating from a different fitness calculation. The fitness
procedures to update their prior beliefs. value of GP traders is defined as the individual
• Easley-Ledyard (EL) from Easley & efficiency achieved, which will be explained later
Ledyard (1993): EL traders balance the in this chapter.
profit and the probability of successfully We did not train our GP traders before they
making transactions by placing aggres- were sent to the double auction tournament. At
sive bids or asks in the beginning, and the beginning of every trading day, each GP trader
then gradually decrease their profit margin randomly picks a strategy from his/her population
when they observe that they might lose of strategies and uses it throughout the whole
chances based on other traders’ bidding day. The performance of each selected strategy
and asking behavior. is recorded, and if a specific strategy is selected
• Empirical strategy is inspired by Chan, more than once, a weighted average will be taken
LeBaron, Lo, & Poggio, and it works in the to emphasize later experiences.
same way as Friedman’s BGAN but devel- GP traders’ strategies are updated–with selec-
ops its belief by constructing histograms tion, crossover, and mutation–every N days, where
from opponents’ past shouted prices. N is called the “select number.” To avoid the flaw
that a strategy is deserted simply because it was not
Named by or after their original designers, selected, we set N as twice the size of the popu-
these strategies were modified to accommodate lation so that theoretically each strategy has the
our discrete double auction mechanism in vari- chance to be selected twice. Tournament selection
ous ways. They were modified according to their is implemented and the size of the tournament is
original design concepts as much as possible. As 5, however big the size of the population is. We
a result, they might not be 100% the same as they also preserve the elite for the next generation,
originally were. and the size of the elite is 1.1 The mutation rate
Although most of the strategies were cre- is 5%, in which 90% of this operation consists of
ated for the purpose of studying price formation a tree mutation.2
processes, we still sent them to the “battlefield”
because they can represent, to a certain degree, Experimental Procedures
various types of trading strategies which can be
observed in financial market studies. Since we have only eight traders (four buyers
and four sellers) in the market while there are
GP Trading Agents twelve trading strategies to be tested, we have to
compare these strategies by randomly sampling
GP agents in this study adopt only standard (without replacement) eight strategies and inject
crossover and mutation operations, by which it is them into the market one at a time. However,
meant that no election, ADFs (Automatic Defined considering the vast amount of combinations and
Functions), nor other mechanisms are imple- permutations of strategies, we did not try out all
mented. We provide GP traders with simple but the possibilities. Instead, 300 random match-ups
basic market information as their terminals, such were created for each series of experiment. In
102
Social Simulation with Both Human Agents and Software Agents
103
Social Simulation with Both Human Agents and Software Agents
104
Social Simulation with Both Human Agents and Software Agents
Learning Capabilities of GP Agents • Figure 3 also shows the results from a prof-
it-variation viewpoint. Other things being
In investigating the GP traders’ learning capabil- equal, a strategy with higher profit and less
ity, we simply compare GP agents with designed variation is preferred. Therefore, one can
strategies collected from the literature. We are draw a frontier connecting the most ef-
interested in the following questions: ficient trading strategies. Figure 3 shows
that GP traders, although with more varia-
1. Can GP traders defeat other strategies? tion in profits in the end, always occupy
2. How many resources are required for GP the ends of the frontier.3
traders to defeat other strategies?
The result of this experiment shows that learn-
GP traders with population sizes of 5, 20, and ing GP traders can outperform other (adaptive)
50 are sampled to answer these questions. Figure strategies, even if those strategies may have a
3 is the result of this experiment. Here we repre- more sophisticated design.
sent GP traders of population sizes 5, 20, and 50
with P5, P20, and P50, respectively. We have the Cognitive Capacity and
following observations from Figure 3: Learning Speed
• No matter how big the population is, GP Psychologists tell us that the intelligence of human
traders can gradually improve and defeat beings involves the ability to “learn quickly and
other strategies. learn from experiences” (Gottfredson, 1997). To
• GP traders can still improve themselves investigate the influence of individual intelligence
even under the extreme condition of a on learning speed, we think of a GP trader’s popu-
population of only 5. The fact that the lation size as a proxy for his/her cognitive capac-
tournament size is also 5 means that strate- ity. Is this parameter able to generate behavioral
gies in the population might converge very outcomes consistent with what psychological
quickly. Figure 4 shows the evolution of research tells us?
the average complexity of GP strategies. Figure 5 delineates GP traders’ learning dy-
In the case of P5, the average complexity namics with a more complete sampling. Roughly
almost equals 1 at the end of the experi- speaking, we can see that the bigger the popula-
ments, meaning that GP traders could still tion size, the less time that GP traders need to
gain superior advantages by constantly perform well. In other words, GP traders with
updating their strategy pools composed higher cognitive capacity tend to learn faster and
of very simple heuristics. In contrast with consequently gain more wealth.
P5, in the case of bigger populations, GP However, if we are careful enough, we may
develops more complex strategies as time also notice that this trend is not as monotonic as
goes by. we might think. It seems that there are three groups
• What is worth noticing is that GP might of learning dynamics in this figure. From P5 to
need a period of time to evolve. The bigger P30, there exists a clearly positive relationship
the population, the fewer the generations between “cognitive capacity” and performance.
that are needed to defeat other strategies. P40 and P50 form the second group: they are not
In any case, it takes hundreds to more than very distinguishable, but both of them are better
a thousand days to achieve good perfor- than traders with lower “cognitive capacity”. The
mances for GP traders. most unexplainable part is P60 to P100. Although
105
Social Simulation with Both Human Agents and Software Agents
Figure 3. Comparison of GP Traders with Designed Strategies. From the top to the bottom rows are
comparisons when GP traders’ population sizes are 5, 20, and 50, respectively. (a) The left panels of
each row are the time series of individual efficiencies. (b) The right panels of each row are the profit-
variation evaluation on the final trading day. The horizontal axis stands for their profitability (individual
efficiency, in percentage terms), and the vertical axis stands for the standard deviation of their profits.
106
Social Simulation with Both Human Agents and Software Agents
Figure 4. The Average Complexities of GP Strategies. The GP traders’ population sizes are 5, 20, and 50,
respectively (from the left panel to the right panel). The complexity is measured in terms of the number
of terminal nodes and function nodes of GP traders’ strategy parse trees.
Figure 5. GP Traders’ Performances at Different Levels of Cognitive Capacity. The horizontal axis
denotes generations; the vertical axis consists of the individual efficiencies obtained by GP traders.
Table 1. Wilcoxon Rank Sum Tests for GP Traders’ Performances on Individual Efficiencies
107
Social Simulation with Both Human Agents and Software Agents
this group apparently outperforms traders with It is shown in Table 2 that GP traders with
lower “cognitive capacity,” the inner-group rela- different cognitive capacity do not have significant
tionship between “cognitive capacity” and per- differences in their performances in market 1,
formance is quite obscure. while the differences in their cognitive capacity
For a better understanding of this phenomenon, do bring about significant discrepancies in final
a series of nonparametric statistical tests were performances in market 3—the bigger the popu-
performed upon these simulation results. The lation size, the better the results they can achieve.
outcomes of these tests are presented in Table 1. Market 2 is somewhere in between market 1 and
Pairwise Wilcoxon Rank Sum Tests show that market 3.
when the “cognitive capacity” levels are low, After a quick overview of GP traders’ perfor-
small differences in cognitive capacity may result mance in these three markets, we now turn our
in significant differences in final performances. attention to the results of human experiments.
On the contrary, among those who have high cog- Unlike GP traders, it is impossible to know human
nitive capacity, differences in cognitive capacity traders’ true cognitive capacity. Fortunately, we
do not seem to cause any significant discrepancy can have access to them via various tests which
in performances. Therefore, there seems to be have been validated by psychologists. In our hu-
a decreasing marginal contribution in terms of man experiments, we have twelve human subjects
performance. recruited from among graduate and undergraduate
This phenomenon can be an analogy of what students. We measure their cognitive capacity
the psychological literature has pointed out: high with five working memory tests (see Table 3). In
intelligence does not always contribute to high Table 3, we normalize subjects’ working memory
performance–the significance of intelligent per- scores so that a negative number means their
formance is more salient when the problems are working memory capacity is below the average
more complex. As to the decreasing marginal of the twelve subjects.
value of intelligence, please see Detterman & Each subject was facing seven truth-telling
Daniel (1989) and Hunt (1995). opponents in their own auction markets, and three
markets (M1, M2, and M3, see Figure 1) were
human Subject Experiments experienced in order by each trader. The dynam-
ics of the human traders’ performance in terms of
As mentioned in the section on experimental de- individual efficiency is plotted in Figure 7.
sign, we conduct multi-agent simulations with GP We have several observations from Figure 7:
traders for the three markets specified in Figure 1.
In order to make it easier to observe and compare, 1. Human traders have quite diverse learning
we choose GP traders with population sizes based patterns in market 1 and market 2, but the
on a log scale: 5, 25, and 125. Figure 6 depicts the patterns appear to be more similar. This may
evolution of GP traders’ performance over time. be due to the idiosyncrasies of the markets,
As Figure 6 shows, GP traders learn very or it may be due to the learning effect of hu-
quickly, but they attain different levels of indi- man traders so that they have come up with
vidual efficiencies in different markets. Does GP more efficient strategies market by market.
traders’ cognitive capacity (population size) play 2. Although there are some exceptions, human
any decision role in their performances? To have traders who have above-average working
a more precise description, detailed test statistics memory capacity seem to have better per-
are computed and the output is presented in Table formance than those with below-average
2. working memory capacity. We can see from
108
Social Simulation with Both Human Agents and Software Agents
Figure 6. GP Traders’ Performances over Time in Market 1, Market 2, and Market 3. The horizontal
axis denotes generations; the vertical axis consists of the individual efficiencies (in percentage terms)
obtained by GP traders.
109
Social Simulation with Both Human Agents and Software Agents
Table 2. Wilcoxon Rank Sum Tests for GP Traders’ Performances in M1, M2, and M3
P5 P25 P125
P5 X
0.2168
P25 0.3690 X
0.004758**
0.1416 0.3660
P125 0.003733** 0.1467 X
0.00000007873** 0.0004625**
The numbers in each cell are the p-values for the null hypothesis of no influence resulting from the difference in population sizes in M1, M2,
and M3 respectively (from the top to the bottom). “ * ” denotes significant results under the 10% significance level; “ ** ” denotes significant
results under the 5% significance level.
the figure that the solid series tend to lie in Thus there seems to be a big difference between
a higher position than the dashed series. their learning speeds. If we are going to have GP
3. On average, it takes human traders less traders compete with human traders in the same
than six trading periods to surpass 90%. GP market, we can obviously observe the difference
traders’ learning speed is about the same: it in their learning speeds.
takes GP traders less than ten generations to Although there is a difference in the GP traders’
achieve similar levels. However, we have to and human traders’ learning speeds, suggesting
notice that the GP traders’ generation consists that human traders may have different methods
of several trading periods—10 periods for or techniques of updating and exploring their
P5, 50 periods for P50, and 250 periods for strategies, it is still possible to modify GP trad-
P125. ers to catch up with human traders. However,
what is important in this research is to delve into
110
Social Simulation with Both Human Agents and Software Agents
Figure 7. Human Traders’ Performances over Time in the Three Markets. Solid series are the learning
dynamics of human traders whose working memory capacity is above average; dashed series are the
learning dynamics of traders whose memory capacity is below average (the left panel). The right panels
are the average performances of above- and below-average traders. The horizontal axis denotes genera-
tions; the vertical axis denotes the individual efficiencies (in percentage terms) obtained by the traders.
the relationship between cognitive capacity and Because we cannot precisely categorize human
learning behavior. Do our GP traders exhibit cor- traders according to their normalized scores, we
responding patterns as human traders? choose to run linear regressions to see how working
Since we have seen that GP traders’ cognitive memory capacity contributes to human traders’
capacity does not play a significant role in mar- performances. The results are rejected and the
ket 1 and market 2, but that it has a positive and explanatory power of the regression model is very
significant influence on performance in market poor. However, we can go back to the raw data
3, will it be the same in our human experiment? and see what might be neglected in our analysis.
111
Social Simulation with Both Human Agents and Software Agents
Table 4. Results of Linear Regression of Working Memory Capacity on Human Traders’ Performances
When we try to emphasize the potential influ- results from the GP simulations tell us that the
ence of the cognitive capacity (here we mean the influences of cognitive capacity are significant in
working memory capacity) on human traders’ market 3, while they are insignificant in market
performances, we are suggesting that cognitive 1, and only significant in market 2 when the dif-
capacity may play a key role in the process- ference in cognitive capacity is 25 times large. In
ing of information as well as the combination brief, we have very similar patterns for the GP
and construction of strategies. The assumption trader simulation and human subject experiments.
here is that people have to get acquainted with Does this prove anything related to our research
the problems and form their strategies from the goals? We have to realize that there is a limita-
beginning. However, this may be far from true tion on our analysis so far, but the limits to our
because people come to the lab with different research may not necessarily work against our
background knowledge and different experiences, analytical results, but may suggest that we need
and experimentalists can control this by excluding more experiments and more evidence to clarify the
experienced subjects from participating in similar entwined effects occurring during human traders’
experiments. How can experienced subjects be decision-making processes. We name several of
excluded even if they did not participate in similar them as follows:
experiments before?
In this study, we can approach this problem 1. The number of human subjects greatly limits
by excluding subjects who have participated in the validity of our research. As a result, we
markets which use double auctions as their trad- have to conduct more human experiments to
ing mechanisms. From a survey after the experi- gain stronger support for our analytic results.
ments, we can identify three subjects who have 2. Does the significance of working memory
experience in stock markets or futures markets.4 capacity appear because of its influences
Following this logic, we re-examine the relation- in decision making, or it is because of a
ship between the working memory capacity and learning effect taking place when human
human traders’ performance, and the results are subjects start from market 1 to market 3?
shown in Table 4. We cannot identify the effects of various
As Table 4 shows, the working memory capac- possible channels.
ity only has a significant influence on traders’ 3. What does the pattern observed in GP
performances in market 3. We can compare this simulations and human experiments mean,
with the GP traders’ results shown in Table 2. The even if working memory capacity can really
112
Social Simulation with Both Human Agents and Software Agents
113
Social Simulation with Both Human Agents and Software Agents
experiments. However, even when being ‘com- joint work including the design phase should be
bined’ together, human experiments are always the expected. We anticipate that researchers can ac-
counselors of agent-based models, just as Duffy quire more precise findings by experimentation
(2006) observes: with the help from human subjects and software
agents in a way delivered in this chapter.
“with a few notable exceptions, researchers have
not sought to understand findings from agent-
based simulations with follow-up experiments ACKNOWLEDGMENT
involving human subjects. The reasons for this
pattern are straightforward. … As human subject The authors are grateful to an anonymous referee
experiments impose more constraints on what a for very helpful suggestions. The authors are also
researcher can do than do agent-based modeling thankful to Prof. Lee-Xieng Yang in the Research
simulations, it seems quite natural that agent- Center of Mind, Brain, and Learning of National
based models would be employed to understand Chengchi University for his professional support
laboratory findings and not the other way around.” and test programs of working memory capac-
(Duffy, 2006, p.951) ity. The research supports in the form of NSC
grant no. NSC 95-2415-H-004-002-MY3, and
Human experiments may be greatly con- NSC 96-2420-H-004-016-DR from the National
strained, but at the same time there are so many Science Council, Taiwan are also gratefully ac-
unobservable but intertwining factors function- knowledged.
ing during human subjects’ decision processes.
On the other hand, agent-based models can be
strictly controlled, and software agents are almost REFERENCES
transparent. Thus, it would be an advantage to turn
to agent-based simulations if researchers want Benjamin, D., Brown, S., & Shapiro, J. (2006).
to isolate the influence of a certain factor. In this Who is ‘behavioral’? Cognitive ability and
regard, we think that agent-based simulations can anomalous preferences. Levine’s Working Paper
also be a tool to discover unknown factors even Archive 122247000000001334, UCLA Depart-
before human experiments are conducted. ment of Economics.
In this chapter, we actually follow this strategy Casari, M. (2004). Can genetic algorithms explain
by eliciting ideas from the psychological literature experimental anomalies? An application to com-
first, transplanting it in an economics environment mon property resources. Computational Econom-
in the form of agent-based simulations, and finally ics, 24, 257–275. doi:10.1007/s10614-004-4197-5
conducting corresponding human experiments
after we have gained support from agent-based Chan, N. T., LeBaron, B., Lo, A. W., & Poggio,
simulations. However, what we mean by multi- T. (2008). Agent-based models of financial mar-
directional relationships among agent-based kets: A comparison with experimental markets.
simulation, human experiments, and psychology MIT Artificial Markets Project, Paper No. 124,
has a deeper meaning. We believe that the knowl- September. Retrieved January 1, 2008, from http://
edge from these three fields has a large space for citeseer.ist.psu.edu/chan99agentbased.html.
collaboration, but it should be done not only by
referring to the results of each other as a final
source of reference. These three disciplines each
possesses an experimental nature, and a cyclical
114
Social Simulation with Both Human Agents and Software Agents
Chen, S.-H., Chie, B.-T., & Tai, C.-C. (2001). Devetag, G., & Warglien, M. (2008). Playing the
Evolving bargaining strategies with genetic pro- wrong game: An experimental analysis of rela-
gramming: An overview of AIE-DA Ver. 2, Part tional complexity and strategic misrepresentation.
2. In B. Verma & A. Ohuchi (Eds.), Proceedings Games and Economic Behavior, 62, 364–382.
of Fourth International Conference on Computa- doi:10.1016/j.geb.2007.05.007
tional Intelligence and Multimedia Applications
Duffy, J. (2006). Agent-based models and human
(ICCIMA 2001) (pp. 55–60). IEEE Computer
subject experiments . In Tesfatsion, L., & Judd, K.
Society Press.
(Eds.), Handbook of Computational Economics
Chen, S.-H., Zeng, R.-J., & Yu, T. (2008). Co- (Vol. 2). North Holland.
evolving trading strategies to analyze bounded
Easley, D., & Ledyard, J. (1993). Theories of price
rationality in double auction markets . In Riolo,
formation and exchange in double oral auction .
R., Soule, T., & Worzel, B. (Eds.), Genetic Pro-
In Friedman, D., & Rust, J. (Eds.), The Double
gramming Theory and Practice VI (pp. 195–213).
Auction Market-Institutions, Theories, and Evi-
Springer.
dence. Addison-Wesley.
Cliff, D., & Bruten, J. (1997). Zero is not enough:
Friedman, D. (1991). A simple testable model
On the lower limit of agent intelligence for con-
of double auction markets. Journal of Eco-
tinuous double auction markets (Technical Report
nomic Behavior & Organization, 15, 47–70.
no. HPL-97-141). Hewlett-Packard Laboratories.
doi:10.1016/0167-2681(91)90004-H
Retrieved January 1, 2008, from http://citeseer.
ist.psu.edu/cliff97zero.html Gjerstad, S., & Dickhaut, J. (1998). Price forma-
tion in double auctions. Games and Economic
Das, R., Hanson, J. E., Kephart, J. O., & Tes-
Behavior, 22, 1–29. doi:10.1006/game.1997.0576
auro, G. (2001). Agent-human interactions in
the continuous double auction. In Proceedings Gode, D., & Sunder, S. (1993). Allocative ef-
of the 17th International Joint Conference on ficiency of markets with zero-intelligence trad-
Artificial Intelligence (IJCAI), San Francisco. ers: Market as a partial substitute for individual
CA: Morgan-Kaufmann. rationality. The Journal of Political Economy,
101, 119–137. doi:10.1086/261868
Detterman, D. K., & Daniel, M. H. (1989). Cor-
relations of mental tests with each other and with Gottfredson, L. S. (1997). Mainstream science on
cognitive variables are highest for low-IQ groups. intelligence: An editorial with 52 signatories, his-
Intelligence, 13, 349–359. doi:10.1016/S0160- tory, and bibliography. Intelligence, 24(1), 13–23.
2896(89)80007-8 doi:10.1016/S0160-2896(97)90011-8
Devetag, G., & Warglien, M. (2003). Games and Gregg, L., & Simon, H. (1979). Process models
phone numbers: Do short-term memory bounds and stochastic theories of simple concept forma-
affect strategic behavior? Journal of Economic tion. In H. Simon, Models of Thought (Vol. I).
Psychology, 24, 189–202. doi:10.1016/S0167- New Haven, CT: Yale Uniersity Press.
4870(02)00202-7
115
Social Simulation with Both Human Agents and Software Agents
Grossklags, J., & Schmidt, C. (2006). Software Taniguchi, K., Nakajima, Y., & Hashimoto, F.
agents and market (in)efficiency—a human trader (2004). A report of U-Mart experiments by human
experiment. IEEE Transactions on System, Man, agents . In Shiratori, R., Arai, K., & Kato, F. (Eds.),
and Cybernetics: Part C . Special Issue on Game- Gaming, Simulations, and Society: Research
theoretic Analysis & Simulation of Negotiation Scope and Perspective (pp. 49–57). Springer.
Agents, 36(1), 56–67.
Zhan, W., & Friedman, D. (2007). Markups in
Hunt, E. (1995). The role of intelligence in mod- double auction markets. Journal of Economic Dy-
ern society. American Scientist, (July/August): namics & Control, 31, 2984–3005. doi:10.1016/j.
356–368. jedc.2006.10.004
Kagel, J. (1995). Auction: A survey of experimen-
tal research . In Kagel, J., & Roth, A. (Eds.), The
Handbook of Experimental Economics. Princeton KEY TERMS AND DEFINITIONS
University Press.
Genetic Programming (GP): An automated
Lewandowsky, S., Oberauer, K., Yang, L.-X., & method for creating a working computer program
Ecker, U. (2009). A working memory test battery from a high-level problem statement of a problem.
for Matlab. under prepartion for being submitted Genetic programming starts with a randomly
to the Journal of Behavioral Research Method. created computer programs. This population of
Payne, J., Bettman, J., & Johnson, E. (1993). The programs is progressively evolved over a series
Adaptive Decision Maker. Cambridge University of generations. The evolutionary search uses the
Press. Darwinian principle of natural selection (survival
of the fittest) and analogs of various naturally
Rust, J., Miller, J., & Palmer, R. (1993). Behavior occurring operations, including crossover (sexual
of trading automata in a computerized double auc- recombination), mutation, etc.
tion market . In Friedman, D., & Rust, J. (Eds.), Cognitive Capacity: A general concept used
Double Auction Markets: Theory, Institutions, in psychology to describe human’s cognitive flex-
and Laboratory Evidence. Redwood City, CA: ibility, verbal learning capacity, learning strategies,
Addison Wesley. intellectual ability, etc. Although cognitive capac-
Rust, J., Miller, J., & Palmer, R. (1994). Character- ity is a very general concept and can be measured
izing effective trading strategies: Insights from a from different aspects with different tests, concrete
computerized double auction tournament. Journal concepts such as intelligence quotient (IQ) and
of Economic Dynamics & Control, 18, 61–96. working memory capacity are considered highly
doi:10.1016/0165-1889(94)90069-8 representative of this notion.
Double Auction: A system in which potential
Simon, H. (1981). Studying human intelligence by buyers submit their bids and potential sellers
creating artificial intelligence. American Scientist, submit their ask prices (offers) simultaneously.
69, 300–309. The market is cleared when a certain price P is
Simon, H. (1996). The Sciences of the Artificial. chosen so that all buyers who bid more than P
Cambridge, MA: MIT Press. and all sellers who ask less than P are matched
to make transactions.
Working Memory: The mental resources
used in the decision-making processes of humans
and is highly related to general intelligence. It is
116
Social Simulation with Both Human Agents and Software Agents
generally assumed that working memory has a main factor determining the compositions
constrained capacity, hence this capacity plays of the populations. Therefore, the number
an important role which determine people’s per- of elite is set to be 1.
formance in cognitive tasks, especially complex 2
Generally speaking, the larger the mutation
reasoning ones. rate, the more diverse the genotypes of the
Boundedly Rational Agents: Experience lim- strategies are. In most studies, the mutation
its in formulating and solving complex problems rate ranges from 1% to 10%, therefore it is
and in processing (receiving, storing, retrieving, set to be 5% in this research.
transmitting) information, therefore, they solve 3
One may suspect that GP traders will per-
problems by using certain heuristics instead of form very poorly from time to time since
optimizing. they also have the biggest variances in the
Individual Efficiency: A ratio used to evaluate profits. To evaluate how worse GP traders
agents’ performance in the markets. In economic can be, we keep track of the rankings of
theory, once demand and supply determine the their performances relative to other trading
equilibrium price, agents’ potential profits (indi- strategies. As a result, the average rankings
vidual surplus) can be measured as the differences of GP traders are the smallest among all the
between his/her reservation prices and the equi- designed trading strategies. This means that
librium price. Individual efficiency is calculated although GP traders may use not-so-good
as the ratio of agents’ actual profits over their strategies sometimes, their performances
potential profits. are still barely adequate as compared with
other kinds of designed trading strategies.
4
From the left panel of Figure 7, we can see
ENDNOTES that among the human traders with lower
working memory capacity, there are about
1
Elitism preserves the best strategy in current two traders who constantly perform quite
population to the next. While elitism helps well in every market. In fact, these traders
preserve good strategies when there is no are exactly those subjects with experience
guarantee that every strategy will be sampled in stock markets or futures markets.
in our designed, we don’t want it to be the
117
118
Chapter 7
Evolution of Agents in a
Simple Artificial Market
Hiroshi Sato
National Defense Academy, Japan
Masao Kubo
National Defense Academy, Japan
Akira Namatame
National Defense Academy, Japan
ABSTRACT
In this chapter, we conduct a comparative study of various traders following different trading strategies.
We design an agent-based artificial stock market consisting of two opposing types of traders: “rational
traders” (or “fundamentalists”) and “imitators” (or “chartists”). Rational traders trade by trying to
optimize their short-term income. On the other hand, imitators trade by copying the majority behavior
of rational traders. We obtain the wealth distribution for different fractions of rational traders and
imitators. When rational traders are in the minority, they can come to dominate imitators in terms of
accumulated wealth. On the other hand, when rational traders are in the majority and imitators are in
the minority, imitators can come to dominate rational traders in terms of accumulated wealth. We show
that survival in a finance market is a kind of minority game in behavioral types, rational traders and
imitators. The coexistence of rational traders and imitators in different combinations may explain the
market’s complex behavior as well as the success or failure of various trading strategies. We also show
that successful rational traders are clustered into two groups: In one group traders always buy and their
wealth is accumulated in stocks; in the other group they always sell and their wealth is accumulated in
cash. However, successful imitators buy and sell coherently and their wealth is accumulated only in cash.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolution of Agents in a Simple Artificial Market
On the other hand, Shleifer and his colleagues just the level of expected returns. The question
questioned the presumption that traders who of whether there are winning and losing market
misperceive returns do not survive (De Long, strategies and how to characterize them has been
1991). Since noise traders who are on average discussed from a practical point of view in (Cino-
bullish bear more risk than do investors holding cotti, 2003). On the one hand, it seems obvious
rational expectations, as long as the market re- that different investors exhibit different investing
wards risk-taking, noise traders can earn a higher behaviors that are responsible for the movement
expected return even though they buy high and of market prices. On the other hand, it is difficult
sell low on average. Because Friedman’s argument to reconcile the regular functioning of financial
does not take into account the possibility that markets with the coexistence of heterogeneous
some patterns of noise traders’ misperceptions investors with different trading strategies (Levy,
might lead them to take on more risk, it cannot 2000). If there exists a consistently winning mar-
be correct as stated. ket strategy, then it is reasonable to assume that
It is difficult to reconcile the regular function- the losing trading strategies will disappear in the
ing of financial markets with the coexistence of long run through the force of natural selection.
different populations of investors. If there is a In this chapter we take an agent-based model
consistently winning market strategy, then it is approach for a comparative study of different
reasonable to assume that the losing population strategies. We examine how traders with various
will disappear in the long run. It was Friedman trading strategies affect prices and their success
who first advanced the hypothesis that in the long in the market measured by their accumulation of
run irrational investors cannot survive because wealth. Specifically, we show that imitators may
they tend to lose wealth and disappear. For agents survive and come to dominate rational investors in
prone to forecasting errors, the fact that different wealth when the proportion of imitators is much
populations with different trading strategies can less than that of rational traders.
coexist still requires an explanation. The chapter is organized as follows: In Sec-
Recent economic and finance research reflects tion 2 we survey the related literature. Section 3
growing interest in marrying the two viewpoints, describes the relationship between the Ising model
that is, in incorporating ideas from the social sci- and the Logit model. Sections 4 and 5 describe an
ences to account for the fact that markets reflect artificial stock market as the main ingredient in
the thoughts, emotions, and actions of real people our agent-based financial market. The simulation
as opposed to the idealized economic investors results and discussion are shown in Sections 6 and
who underlie efficient markets (LeBaron, 2000). 7 respectively. Section 8 concludes the chapter.
Assumptions about the frailty of human rational-
ity and the acceptance of such drives as fear and
greed underlie the recipes developed over the RELATED LITERATURE
decades in so-called technical analysis. There is
growing empirical evidence of the existence of One can distinguish two competing hypotheses
herd or crowd behavior. Herd behavior is often by their origins, one derived from the traditional
said to occur when many people take the same Efficient Market Hypothesis (EMH) and a recent
action, because some mimic the actions of others alternative that is sometimes called the Interacting
(Sornette, 2003). Agent Hypothesis (IAH) (Tesfatsion, 2002). The
To adequately analyze whether noise traders EMH states that the price fully and instantaneously
are likely to persist in an asset market, we need to reflects any new information: The market is, there-
describe the long run distribution of wealth, not fore, efficient in aggregating available information
119
Evolution of Agents in a Simple Artificial Market
with its invisible hand. The agents are assumed their fractions show considerable fluctuation over
to be rational and homogeneous with respect to time. The mean-reversion regime corresponds to
their access and their assessment of information; the situation in which the market is dominated
as a consequence, interactions among them can by fundamentalists who recognize overpricing
be neglected. or underpricing of the asset and who expect the
In recent literature, several papers try to explain stock price to move back towards its fundamental
the stylized facts as the macroscopic outcome of value. The trend-following regime represents a
an ensemble of heterogeneous interacting agents situation when the market is dominated by trend
(Cont, 2000; LeBaron, 2001). In this view, the followers expecting continuation of, for example,
market is populated by agents with different good news in the (near) future and so expect posi-
characteristics, such as differences in access to tive stock returns.
and interpretation of available information, dif- They also allow the coexistence of different
ferent expectations or different trading strategies. types of investors with heterogeneous expectations
The traders interact, for example, by exchang- about future payoffs and evolutionary switching
ing information, or they trade by imitating the between different investment strategies. Disagree-
behavior of others. The market possesses, then, ment in asset pricing models can arise because
an endogenous dynamics, and the strict one-to- of two assumptions: differential information and
one relationship with the news arrival process differential interpretation. In the first case, there is
does not hold any longer (although the market an information asymmetry between one group of
might still be efficient in the sense of a lack of agents that observes a private signal and the rest
predictability). The universality of the statistical of the population that has to learn the fundamental
regularities is seen as an emergent property of this value from public information, such as prices.
internal dynamics, governed by the interactions Asymmetric information causes heterogeneous
among agents. expectations among agents.
Boswijk et al. estimated an asset-pricing model Agents use different “models of the market”
using annual US stock price data from 1871 until to update their subjective valuation based on the
2003 (Boswijk, 2004). The estimation results earnings news, and this might lead them to hold
support the existence of two expectation regimes. different beliefs. However, the heterogeneity of
The first can be characterized as a fundamentalist expectations might play a significant role in as-
regime because agents believe in mean reversion set pricing. A large number of models have been
of stock prices toward the benchmark fundamen- proposed that incorporate this hypothesis. They
tal value. The second can be characterized as a assume that agents adopt a belief based on its past
chartist trend-following regime because agents performance relative to the competing strategies.
expect deviations from the fundamental to trend. If a belief performed relatively well, as measured
The fractions of agents using the fundamentalists by realized profits, it attracts more investors while
forecasting rule and of agents using the trend- the fraction of agents using the “losing” strategies
following forecasting rule show substantial time will decrease. Realized returns thus contribute
variation and switching between predictors. more support to some of the belief strategies
They suggest that behavioral heterogeneity is than others, which leads to time variation in the
significant and that there are two different regimes, sentiment of the market.
a “mean reversion” regime and a “trend follow- The assumption of evolutionary switching
ing” regime. To each regime, there corresponds a among beliefs adds a dynamic aspect that is miss-
different investor type: fundamentalists and trend ing in most of the models with heterogeneous
followers. These two investor types coexist and opinions mentioned above. In our model investors
120
Evolution of Agents in a Simple Artificial Market
are boundedly rational because they learn from spins relative to one another. The character of
the past performance of the strategies which one the magnetic substance is determined by the in-
is more likely to be successful in the near future. teraction of the spins. In the investor model, the
They do not use the same predictor in every two spin states represent an agent’s investment
period and make mistakes, but switch between attitude. Each agent changes attitude according
beliefs in order to minimize their errors. Agents to the probability of the spin reversing.
may coordinate expectations on trend-following The probability Pi(t + 1) that agent i buys at
behavior and mean reversion, leading to asset price time t + 1 is defined as
fluctuations around a constant fundamental price.
(Alfarano, 2004) also estimated a heteroge- 1
Pi (t + 1) = , (3.1)
neous agent model (HAM) to exchange rates 1 + exp(−2bhi (t ))
with fundamentalists and chartists and found
considerable fluctuation of the market impact
of fundamentalists. All these empirical papers where
suggest that heterogeneity is important in explain-
ing the data, but much more work is needed to hi (t ) = ∑ J ij S j (t ) − aSi (t ) M (t ) (3.2)
j
investigate the robustness of this empirical find-
ing. Our chapter may be seen as one of the first
attempts to estimate a behavioral HAM on stock M (t ) = ∑ S i (t ) / N (3.3)
market data and investigate whether behavioral i
heterogeneity is significant.
In (3.1) hi(t), defined in (3.2), represents the
investment attitude of agent i, the parameter β is
INFERRING UTILITY FUNCTIONS a positive constant, and Jij represents the influ-
OF SUCCESSES AND FAILURES ence level of neighboring agent j. Therefore, the
first term of (3.2) represents the influence of the
In this section, we try to infer the utility functions neighborhood. The investment variables Sj, j =
of traders by relating the so-called Ising model 1, 2, . . ., n take the value -1 when agent j sells
and the Logit model. We clarify the following and +1 when she buys. The second term of (3.2)
fact: success calls success and failure calls failure. represents the average investment attitude, with α
a positive constant. If many agents buy, then the
Ising Model investment attitude decreases. The investment
attitude represents the agent’s conformity with
Bornholdt and his colleagues analyzed profit mar- neighboring agents.
gins and volatility by using the Ising model, which The average investment attitude should rise at
is a phase transition model in physics (Bornholdt, least so that prices may rise more than this time
2001; Kaizoji, 2001). The Ising model is a model step. In other words, it is necessary that the number
of magnetic substances proposed by Ising in 1925 of agents who purchase be greater than this term.
(Palmer, 1994). In the model, there are two modes It is thought that the probability of the investment
of spin: upward (S = +1) and downward (S = -1). attitude changing rises as the absolute value of
Investment attitude in the investor model plays M(t) approaches one. It can be said that the agent
the same role as spin plays in the Ising model. is “applying the brakes” to the action, where “ac-
In the model, magnetic interactions seek to align tion” refers to the opinion of the neighborhood.
121
Evolution of Agents in a Simple Artificial Market
2αβ
By denoting the joint probability density func- = [λ(n1(t ) − n2 (t )) − Si (t ) N 1(t ) − N 2 (t ) ]
N
tion of the random variables εi, for i = 1, 2, by (3.8)
f(ε1, ε2), we can derive
We also assume that JN/α = l and 2αβ/N = 1,
+∞ V1 −V 2 + e1
and consider the following two cases:
pi = ∫ e1 =−∞ ∫ e2 =−∞
f (e1, e2 )d e2d e1 . (3.5)
(Case 1) N 1 (t ) − N 2 (t ) ≥ 0 : then we have
V2 (t + 1) = ln2 (t ) + Si (t )N 2 (t ) (3.10)
The probability of agent i buying is given as
a function of the difference between the utility of
buying and the utility of selling.
122
Evolution of Agents in a Simple Artificial Market
123
Evolution of Agents in a Simple Artificial Market
other poorly understood set of design questions. “neighbors” because she knows that they, as well
However, on the positive side, it may allow one as the rest of the crowd, will have similar ideas
to study the impact of different trading mecha- about trying to outguess each other on when to
nisms, all of which would be inconsequential in enter the market. More generally, ideally she likes
an equilibrium world. to be in the minority when entering the market,
Most agent-based markets have solved this in the majority while holding her position and
problem in one of three ways: by assuming a simple again in the minority when closing her position.
price response to excess demand, by building the
market in such a way that a kind of local equilib-
rium price can be found easily, or by explicitly hYPOThETICAL VALIDATION
modeling the dynamics of trading to look like USING AN AGENT-BASED MODEL
the continuous trading in an actual market. Most
of the earliest agent-based markets used the first In this section we introduce three population-types
method to model price movements. Most markets that have been already described in the literature
of this type poll traders for their current demand, and that represent more realistic trading behaviors.
sum up the market demand and if there is excess The aim is twofold: First, we want to study the
demand, increase the price. If there is an excess behavior of these stylized populations in a realistic
supply, they decrease the price. environment characterized by limited resources
This has been interpreted as evidence that as and a market clearing mechanism. Second, we
a forecaster ages, evaluators develop tighter prior want to address the important issue of whether
beliefs about the forecaster’s ability, and hence or not winning strategies exist. The fractions of
the forecaster has less incentive to herd with the agents using the fundamental and trend-following
group. On the other hand, the incentive for a forecasting rules show substantial time variation
second-mover to discard his private information and switching between predictors.
and instead mimic the market leader increases
with his initial reputation, as he strives to protect Market Mechanism and
his current status and level of pay. In a practical Performance Measures
implementation of a trading strategy, it is not
sufficient to know or guess the overall direction One of the most important parts of agent-based
of the market. There are additional subtleties markets is the actual mechanism that governs the
governing how the trader is going to enter (buy trading of assets. Most agent-based markets as-
or sell in) the market. For instance, a trader will sume a simple price response to excess demand
want to be slightly ahead of the herd to buy at a and the market is built so that finding a local
better price, before the price is pushed up by the equilibrium price is not difficult. If supply ex-
bullish consensus. Symmetrically, she will want ceeds demand, then the price decreases. Agents
to exit the market a bit before the crowd, that is, maintain stock and capital, and stock is bought or
before a trend reversal. In other words, she would sold in exchange for capital. The model generates
like to be somewhat of a contrarian by buying fluctuations in the value of the stock by limiting
when the majority is still selling and by selling transactions to one unit of stock.
when the majority is still buying, slightly before Price model. The basic model assumes that the
a change of opinion of the majority of her “neigh- stock price reflects the level of excess demand,
bors”. This means that she will not always want which is governed by
to follow the herd, at least at finer time scales. At
this level, she cannot rely on the polling of her
124
Evolution of Agents in a Simple Artificial Market
P (t ) = P (t − 1) + c[N 1(t ) − N 2 (t )], (5.1) body of research has sought to explain the data
with aggregate models in which a representative
where P(t) is the stock price at time t, N1(t) and agent solves this optimization problem. If the goal
N2(t) are the corresponding number of agents buy- is simply to fit the data, it is not unreasonable to
ing and selling respectively, and χ is a constant. attribute to agents the capacity to explicitly formu-
This expression implies that the stock price is a late and solve dynamic programming problems.
function of the excess demand. That is, the price However, there is strong empirical evidence that
rises when there are more agents buying, and it humans do not perform well on problems whose
descends when more agents are selling. solution involves backward induction. For this
Price volatility. reason, these models fail to provide a realistic ac-
count of the phenomenon. The model we describe
v(t ) = [P (t ) − P (t − 1)] / P (t − 1). (5.2) will not invoke a representative agent, but will
posit a heterogeneous population of individu-
als. Some of these will behave “as if” they were
Individual wealth. We introduce the notional fully informed optimizers, while others will not.
wealth Wi(t) of agent i into the model as follows: Social networks and social interactions–clearly
absent from the prevailing literature–will play an
Wi(t) = P(t)*Φi (t) + Ci (t), (5.3)
explicit central role.
Heterogeneity turns up repeatedly as a crucial
where Φi is the amount of assets (stock) held and
factor in many evolving systems and organizations.
Ci is the amount of cash (capital) held by agent i.
But the situation is not always as simple as saying
It is clear from the equation that an exchange of
that heterogeneity is desirable and homogeneity
cash for assets at any price does not in any way
is undesirable. This remains a basic question in
affect the agent’s notional wealth. However, the
many fields: What is the right balance between
important point is that the wealth Wi(t) is only
heterogeneity and homogeneity? When heteroge-
notional and not real in any sense. The only real
neity is significant, we need to be able to show
measure of wealth is Ci(t), the amount of capital
the gains associated with it. However, analysis of
the agent has available to spend. Thus, it is evident
a collection of heterogeneous agents is difficult,
that an agent has to do a “round trip” (buy (sell)
often intractable.
a stock and then sell (buy) it back) to discover
The notion of type facilitates the analysis
whether a real profit has been made.
of heterogeneity. A type is a category of agents
within the larger population who share some char-
Trader Types acteristics. We distinguish types by some aspects
of the agents’ unobservable internal model that
For modeling purposes, we use representative
characterize their observable behaviors. One can
agents (rational agents) who make rational deci-
imagine how such learning models might evolve
sions in the following stylized terms: If they expect
over time towards equilibria. In principle, this
the price to go up then they buy, and if they expect
evolutionary element can be folded into a meta-
the price to go down then they sell immediately.
learning that includes both the short-term learning
But this then leads to the problem, what happens
and long-term evolution.
if every trader behaves in this same way?
Interaction between agents is a key feature of
Here, some endogenous disturbances need
agent-based systems. Traditional market models
to be introduced. Given this disturbances, the
do not deny that agents interact but assume that
individual is modeled to behave differently. One
they only do so through the price system. Yet
125
Evolution of Agents in a Simple Artificial Market
agents do, in fact, communicate with each other among actual traders, albeit it is probably
and learn from each other. The investor who en- less popular than trend-following strategies.
ters the market forecasts stock prices by various 2. Imitators. Traders may have incorrect ex-
techniques. For example, the investor makes a pectations about price movements. If there
linear forecast of past price data, forecasts based are such misperceptions, imitators who do
on information from news media, and so forth. not affect prices may earn higher payoffs than
The types of typical investors are usually described strategic traders. Each imitator has a unique
based on differences in their forecast methods (or social network with strategic traders. Within
methods of deciding their investment attitude). this individual network, if the majority of
Three typical investor types are as follows: strategic traders buy then she also buys, and
if the majority of strategic traders sell then
• Fundamentalist: investor with a fundamen- she also sells. It is now widely held that
talist investment attitude based on various mimetic responses result in herd behavior
economic indicators. and, crucially, that the properties of herding
• Chartist: investor who uses analysis tech- arise in financial markets.
niques for finding present value from
charting past price movement. Trading Rules of Trader Types
• Noise trader: investor who behaves ac-
cording to a strategy not based on funda- Agents are categorized by their strategy space.
mental analysis. Since the space of all strategies is complex, this
categorization is not trivial. Therefore, we might,
In this chapter, traders are segmented into two for example, constrain the agents to be finite au-
basic types depending on their respective trading tomata with a bounded number of states. Even after
behavior: rational traders and imitators. Rational making this kind of limitation, we might still be
traders are further classified into two types: mo- left with too large a space to reason about, but there
mentum and contrarian traders. are further disciplined approaches to winnowing
down the space. An example of a more commonly
1. Rational traders. If we assume the funda- used approach, is to assume that the opponent is
mental value is constant, their investment a “rational learner” and to place restrictions on
strategy is based on their expectation of the the opponent’s prior about our strategies. In this
trend continuing or reversing. section we describe a trading rule for each type
Momentum trader: These traders are trend of trader discussed in the previous section.
followers who make decisions based on the
trend of past prices. A momentum trader 1. Rational traders (fundamentalists):
speculates that if prices are rising, they will Rational traders observe the trend of the
keep rising, and if prices are falling, they market and trade so that their short-term
will keep falling. payoff will be improved. Therefore if the
Contrarian trader: These traders differ in trend of the market is “buy”, this agent’s
trading behavior. Contrarian traders specu- attitude is “sell”. On the other hand, if the
late that, if the price is rising, it will stop trend of the market is “sell”, this agent’s
rising soon and will decrease, so it is better attitude is “buy”. As has been explained,
to sell near the maximum. Conversely, if the trading according to the minority decision
price is falling, it will stop falling soon and creates wealth for the agent on performing the
will rise. This trading behavior is present necessary trade, whereas trading according to
126
Evolution of Agents in a Simple Artificial Market
the majority decision loses wealth. However, where εi (-0.5 < εi < 0.5) is the rate of bullishness
if the agent has held the asset for a length of and timidity of the agent and differs depending
time between buying it and selling it back, on the agent.
his wealth will also depend on the rise and Trading rule for rational traders:
fall of the asset price over the holding period.
On the other hand, the amount of stock that If R F (t) > 0.5 then sell
the purchaser (seller) can put in a single deal If R F (t) < 0.5 then buy (5.6)
and buy (sell) is one unit. Therefore, when If εi is large, agent i has a tendency to “buy”, and
the numbers of purchasers and sellers are if it is small, agent i has a tendency to “sell”.
different, there exists an agent who cannot
make her desired transaction: 2. Imitators (chartists): These agents watch
◦ When sellers are in the majority: the behavior of the rational traders. If the
There is an agent who cannot sell majority of rational traders “buy” then the
even if she is selected to sell exists. imitators “buy”, and if the majority of ratio-
Because the price still falls in a buy- nal traders “sell” then the imitators “sell”.
er’s market, it is an agent agents who
sell are maintaining a large amount of We can formulate the imitators’ behavior as
properties. The agents who maintain follows:
the most property are the ones able to
sell. RS(t): The fraction of rational traders buying at
◦ When buyers are in the majority: time t
There is an agent who cannot buy PI(t): The value of RS(t) estimated by imitator j
even if she is selected to buy. Because
the price rises, The agent still able to
PI (t) = R F (t - 1) + µj (5.7)
buy is the one maintaining a large
amount of capital. The agents who
where εj (-0.5 < εj < 0.5) is the rate of bullishness
maintain the most property are the
and timidity of imitator j and in this experiments,
ones able to buy.
εj is normally distributed.
Trading rule for imitators:
The above trading behavior is formulated as
follows. We use the following terminology:
If RF (t ) > 0.5 then buy
N1(t): Number of agents who buy at time t. If RF (t ) < 0.5 then sell (5.8)
N: Number of agents who participate in the market.
R(t ) = N 1(t ) / N : The rate of agents buying at
SIMULATION RESULTS
time t.
We consider an artificial stock market consisting
We also denote the estimated rate of buying
of 2,500 traders in total. In Figure 1, we show
of agent i at time t as
market prices over time for varying fractions of
rational traders and imitators.
R (t) = R(t - 1) + µ (5.5)
F i
127
Evolution of Agents in a Simple Artificial Market
Figure 1. Market prices over time for varying fractions of rational traders and imitators
Stock Prices Over Time Increasing the fraction of rational traders sta-
bilizes the market. Maximum stability is achieved
Imitators mimic the movement of a small number when the fraction of rational traders in the popula-
of rational traders. If the rational traders start to tion is 70% and that of the imitators 30%. On the
raise the stock price, the imitators also act to raise other hand, increasing the number of the rational
the stock price. If the rational traders start to lower traders further induces more fluctuation, and the
the stock price, the imitators lower the stock price price will cycle up and down if the fraction of
further. Therefore, the actions of a large number rational traders is increased to 80%. Rational
of imitators amplify the price movement caused traders always trade in whatever direction places
by the rational traders, increasing the fluctuation them in a minority position. In this situation, their
in the value of the stock. actions do not induce fluctuations in the market
128
Evolution of Agents in a Simple Artificial Market
Figure 2. Movement of price over time when the Figure 3. Movement of price over time when
fraction of rational traders increases gradually the fraction of rational traders moves randomly
from 20% to 80% between 20% and 80%
price. However, when rational traders are in the tators, no trader is a winner or a loser and none
majority, their movements give rise to large market accumulates wealth.
price fluctuations. On the other hand, when the rational traders
In Figure 2 we show the price movement when are in the majority, and the imitators are in the
the fraction of rational traders is increased gradu- minority, the average wealth of the imitators in-
ally from 20% to 80%. Figure 3 shows the price creases over time and that of the rational traders
movement when the fraction of rational traders decreases. Therefore, when the imitators are in the
moves randomly between 20% and 80%. minority, they are better off and their successful
accumulation of wealth is due to losses by the
Comparison of Wealth majority, the rational traders.
We also show the average wealth of the rational Evolution of the Population
traders and imitators over time for varying frac-
tions of rational traders and imitators. We then change the composition of the traders
We now conduct a comparative study of ra- using an evolutionary technique. Eventually, poor
tional traders and imitators. Imitators only mimic traders learn from other, wealthy traders. Figure 5
the actions of rational traders. On the other hand, shows the two typical cases of evolution.
rational traders deliberately consider the direction Domination occurs when traders evolve ac-
of movement of the stock price. Our question is cording to total assets because it takes some time
which type is better off in terms of their accumu- to reverse the disparity in total assets between
lated wealth. winners and losers. On the other hand, rational
When rational traders are not in the majority agents and imitators coexist when they evolve
(their fraction is less than 50%), their average according to their gain in assets. An important
wealth increases over time and that of the imita- point is that coexistence is not a normal situation.
tors decreases. Therefore, if the rational traders Various conditions are necessary for both types
are in the minority, they are better off and their to coexist, including an appropriate updating
successful accumulation of wealth is due to scheme.
losses by the majority, the imitators.
In the region where the number of the rational
traders is almost the same as the number of imi-
129
Evolution of Agents in a Simple Artificial Market
Figure 4. Changes in average wealth over time for different fractions of rational traders and imitators
130
Evolution of Agents in a Simple Artificial Market
Figure 5. Time path of the composition of traders’ types (a) Evolution by wealth (sum of cash and stocks),
(b) Evolution by gain in wealth
Trading strategies yield different results under expect the price to go down, then they will sell
different market conditions. In real life, differ- immediately. In order to introduce heterogeneity
ent populations of traders with different trading among strategic agents we also introduce some
strategies do coexist. These strategies are bound- randomness in the behavioral rules. The other
edly rational and thus one cannot really invoke group consists of imitators, who mimic the strate-
rational expectations in any operational sense. gic traders of their social networks. The model we
Though market price processes in the absence of describe does not invoke a representative agent,
arbitrage can always be described as the rational but posits a heterogeneous population of agents.
activity of utility maximizing agents, the behavior Some of these behave as if they are fully informed
of these agents cannot be operationally defined. optimizers, while others do not.
This work shows that the coexistence of different
trading strategies is not a trivial fact but requires
explanation. SUMMARY AND FUTURE WORK
One could randomize strategies, imposing
that traders statistically shift from one strategy to Experimental economics and psychology have
another. It is however difficult to explain why a now produced strong empirical support for the
trader embracing a winning strategy should switch view that framing effects as well as contextual
to a losing strategy. Perhaps the market changes and other psychological factors put a large gap
continuously and makes trading strategies ran- between homo-sapiens and individuals with
domly more or less successful. More experimental bounded rationality. The question we pose in this
work is necessary to gain an understanding of the chapter is as follows: Does that matter and how
conditions that allow the coexistence of differ- does it matter? To answer these questions, we
ent trading populations. As noted earlier, there developed a model in which imitation in social
are two broad types of agents and we designate networks can ultimately yield high aggregate
them “strategic traders” (“rational agents”) and levels of optimal behavior. It should be noted that
“imitators”. The agents in our model fall into two the fraction of agents who are rational in such an
categories. Members of one group (strategic trad- imitative system will definitely affect the stock
ers) adopt the optimal decision rules. If they expect market. But the eventual (asymptotic) attainment
the price to go up, then they will buy, and if they per se of such a state need not depend on the
131
Evolution of Agents in a Simple Artificial Market
extent to which rationality is bounded. Perhaps Cincotti, S., Focardi, S., Marchesi, M., & Raberto,
the main issue then is not how much rationality M. (2003). Who wins? Study of long-run trader sur-
there is at the micro level, but how little is enough vival in an artificial stock market. Physica A, 324,
to generate macro-level patterns in which most 227–233. doi:10.1016/S0378-4371(02)01902-7
agents are behaving “as if” they were rational, and
Cont, R., & Bouchaud, J.-P. (2000). Herd behavior
how various social networks affect the dynamics
and aggregate fluctuations in financial markets.
of such patterns.
Macro-economics Dynamics, 4, 170–196.
We conclude by describing our plan for further
research. An evolutionary selection mechanism De Long, J. B., Shleifer, A. L., Summers, H., &
based on relative past profits will govern the Waldmann, R. J. (1991). The survival of noise
dynamics of the fractions and the switching of traders in financial markets. The Journal of Busi-
agents between different beliefs or forecasting ness, 64(1), 1–19. doi:10.1086/296523
strategies. A strategy attracts more agents if it
Durlauf, S. N., & Young, H. P. (2001). Social
performed relatively well in the recent past com-
Dynamics. Brookings Institution Press.
pared to other strategies. There are two related
theoretical issues. One is the connection between Johnson, N., Jeffries, P., & Hui, P. M. (2003).
individual rationality and aggregate efficiency, Financial Market Complexity. Oxford.
that is, between optimization by individuals and
Kaizoji. T, Bornholdt, S. & Fujiwara.Y. (2002).
optimality in the aggregate. The second is the
Dynamics of price and trading volume in a spin
role of social interactions and social networks in
model of stock markets with heterogeneous agent.
individual decision-making and in determining
Physica A.
macroscopic outcomes and dynamics. Regarding
the first, much of mathematical social science Le Baron, B. (2001). A builder’s guide to agent-
assumes that aggregate efficiency requires indi- based financial markets. Quantitative Finance,
vidual optimization. Perhaps this is why bounded 1(2), 254–261. doi:10.1088/1469-7688/1/2/307
rationality is disturbing to most economists: They
LeBaron, B. (2000). Agent based computational
implicitly believe that if the individual is not suf-
finance: suggested readings and early research.
ficiently rational it must follow that decentralized
Journal of Economic Dynamics & Control, 24,
behavior is doomed to produce inefficiency.
679–702. doi:10.1016/S0165-1889(99)00022-6
Levy, M. Levy, H., & Solomon, S. (2000). Mi-
REFERENCES croscopic Simulation of Financial Markets: From
Investor Behavior to Market Phenomena. San
Alfarano, S., Wagner, F., & Lux,T. (2004). Es- Diego: Academic Press.
timation of Agent-Based Models: the case of an
asymmetric herding model. Lux, T., & Marchesi, M. (1999). Scaling and
criticality in a stochastic multi-agent model
Bornholdt, S. (2001). Expectation bubbles in a of a financial market. Nature, 397, 498–500.
spin model of markets. International Journal of doi:10.1038/17290
Modern Physics C, 12(5), 667–674. doi:10.1142/
S0129183101001845
Boswijk H. P., Hommes C. H, & Manzan, S.
(2004). Behavioral Heterogeneity in Stock Prices.
132
Evolution of Agents in a Simple Artificial Market
Palmer, R. G., Arthur, W. B., Holland, J. H., KEY TERMS AND DEFINITIONS
LeBaron, B., & Tayler, P. (1994). Artificial eco-
nomic life: A simple model of a stock market. Artificial Market: a research approach of the
Physica D. Nonlinear Phenomena, 75, 264–274. market by creating market artificially.
doi:10.1016/0167-2789(94)90287-9 Agent-Based Model: a class of computational
models for simulating the actions and interactions
Raberto, M., Cincotti, S., Focardi, M., & Mar- of autonomous agents
chesi, M. (2001). Agent-based simulation of a Rational Trader: a type of trader whose
financial market. Physica A, 299(1-2), 320–328. decisions of buy, sell, or hold are based on fun-
doi:10.1016/S0378-4371(01)00312-0 damental analysis
Sornette, D. (2003). Why stock markets crash. Noise Trader: a type of trader whose decisions
Princeton University Press. of buy, sell, or hold are not based on fundamental
analysis
Tesfatsion, L. (2002). Agent-based com- Ising Model: a mathematical model of fer-
putational economics: Growing economies romagnetism in statistical mechanics.
from the bottom up. Artificial Life, 8, 55–82. Logit Model: a mathematical model of human
doi:10.1162/106454602753694765 decision in statistics.
133
134
Chapter 8
Agent-Based Modeling Bridges
Theory of Behavioral Finance
and Financial Markets
Hiroshi Takahashi1
Keio University, Japan
Takao Terano2
Tokyo Institute of Technology, Japan
ABSTRACT
This chapter describes advances of agent-based models to financial market analyses based on our recent
research. We have developed several agent-based models to analyze microscopic and macroscopic links
between investor behaviors and price fluctuations in a financial market. The models are characterized by
the methodology that analyzes the relations among micro-level decision making rules of the agents and
macro-level social behaviors via computer simulations. In this chapter, we report the outline of recent
results of our analysis. From the extensive analyses, we have found that (1) investors’ overconfidence
behaviors plays various roles in a financial market, (2) overconfident investors emerge in a bottom-up
fashion in the market, (3) they contribute to the efficient trades in the market, which adequately reflects
fundamental values, (4) the passive investment strategy is valid in a realistic efficient market, however,
it could have bad influences such as instability of market and inadequate asset pricing deviations, and
(5) under certain assumptions, the passive investment strategy and active investment strategy could
coexist in a financial market.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
investors. CAPM indicates that the optimal The next section of this chapter describes
investment strategy is to hold market portfolio the model utilized for this analysis,then analysis
(Sharpe, 1964). results are discussed in sections 3 and 4. Section
However, conventional finance theory meets 5 contains summary and conclusion.
severe critiques about the validities of the as-
sumptions on the markets, or the capabilities to
explain real world phenomena. For example, the DESCRIPTION OF AN AGENT-
worldwide financial crisis in 2008 was said to BASED FINANCIAL MARKET MODEL
be the one, which would occur per ten decades.
Recently, N. N. Taleb describes the role of acci- Basic Framework and Architecture
dental effects in a financial markets and human of Models of a Financial Market
cognitions about the effects (Taleb, 2001). Also,
researchers in behavioral finance have raised some In our research, first, we have observed the macro
doubts about the efficient market assumption, level phenomena of a real financial market, then,
by arguing that an irrational trader could have second, we have modeled the phenomena in an
influences on asset prices (Shiller, 2000; Shleifer, artificial market in a computer. To model the mar-
2000; Kahneman, Tversky, 1979; Kahneman, ket, third, we have introduced micro level decision
Tversky, 1992). making strategies of human investors based on
To address the problems, we employ agent- the recent research on behavioral financial theory
based model (Arthur, 1997; Axelrod, 1997) in and cognitive science (Shleifer, 2000). Forth, we
order to analyze the relation between micro-rules have designed micro-macro level interactions in
and macro-behavior (Axtell, 2000; Russell, 1995). the artificial market, which are not able to be ex-
In the literature, they have frequently reported that amined in the real world. Therefore, our method is
a variety of macro-behavior emerges bottom-up a constructive approach to bridge the state-of-the
from local micro-rules (Epstein, 1996; Levy, art financial theory and real behaviors in a market
2000; Terano, 2001; Terano, 2003; Arthur, 1997; through agent-based models. The framework is
Tesfatsion, 2002). We have developed an artificial summarized in Figure 1.
financial market model with decision making Based on the framework, we have imple-
agents. So far, we have reported on micro-macro mented a common artificial market model de-
links among agents and markets, investors’ be- picted in Figure 2. The market model is character-
haviors with various mental models, and risk ized as follows: (1) benefit and/or loss of a firm
management strategies of the firms (Takahashi, is randomly determined, (2) the information is
2003; Takahashi, 2004; Takahashi, 2006; Taka- observed by investor agents to make their invest-
hashi, 2007; Takahashi, 2010). In this chapter, ment decisions, (3) based on the decisions, agents
based on our recent research, we will describe the trade the financial assets in the artificial market,
basic principles and architecture of our simulator and the market prices are determined, and (4) the
and explain our main findings. The objective of determined prices of the market again give the
the research is to investigate (1) the influences of effects of decision making of the agents. The
micro- and macro-level of investment strategies, detailed descriptions of the model are given below.
(2) roles of the evaluation method, and (3) financial A agent-based simulator of the financial mar-
behaviors, when there are so many investors with ket involving 1,000 investors is used as the
different strategies. model for this research. Several types of investors
exist in the market, each of them undertakes
135
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
transactions based on their own stock calculations. Assets Traded in the Market
They share and risk-free assets with the two pos-
sible transaction methods. The execution of the The market consists of both risk-free and risky
simulator consists of the three major steps: (1) assets. About the risky assets, all profits gained
generation of corporate earnings, (2) formation during each term are distributed to the sharehold-
of investor forecasts, and (3) setting transaction ers. Corporate earnings (yt ) are expressed as
prices. The market conditions will change through
these steps. About the details of parameters of the (y t )
= yt −1 ⋅ (1 + et ) . They are generated accord-
simulator, please refer to the appendix 1. ( )
ing to the process εt ~ N 0, σy 2 . Risky assets
136
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
are traded just after the public announcement of (b) Trend Predictors
profit for the term. Each investor is given common
asset holdings at the start of the term with no We formulate a model of the investor who
limit placed on debit and credit transactions. finds out the trends from randomly fluctuate stock
prices. This type of investor predicts the stock
Modeling Passive Investors ( )
price Pt +f 1 of the next period by extrapolating
the latest stock trends (10 days). The trend predic-
Passive investors of the simulation model invest
their assets with the same ratio of the market ( )
tors forecast the stock price Pt +f 1 and the profit
benchmarks. This means that (1) each passive (y ) f
t +1
from the trend at period t-1 as
investors keeps one volume stock during the
Pt +f 1 = Pt −1 ⋅ (1 + at −1 ) andytf+1 = yt ⋅ (1 + at −1 ),
2
investment periods, (2) the investment ratio to
the stocks is automatically determined, and (3)
the trade strategy follows buy-and-hold of initial
where ( ( ) )
at −1 = 1 10 ⋅ ∑ i =1 Pt −i Pt −i −1 − 1 .
10
137
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
ferent future prospects with confidence. It seems are the decreasing function of the stock price, and
that all investors tend to have overconfidence in the total number of the stock issued in the market
varying degrees. (N ) is constant. We derive the traded price as
We formulate the model of investors who are
∑ (F ⋅ w ) P = N by calculating the price
M i i
t ti t
overconfident in their own predictions by assum- i =1
ing that they underestimate the risk of the stock. (P ), where the total amount of the stock retained
t
( ) ( ).
2 2
value k (k = 0.6) as s s = k sh Natural selection rules are equipped in the market
to represent the changes of cumulative excess
Calculation of Expected return for the most recent 5 terms (Takahashi,
Return Rate of the Stock Terano, 2003). The rules are divided into the
two steps: (1) appointment of investors who alter
The investors in the market predict the stock price their investment strategy, and (2) alteration of
( )
Pt +f 1 and the corporate profit ytf+1 at the term ( ) investment strategy. With the alteration of invest-
t+1 based on the corporate profit (yt ) at the term ment strategy, existence of investment strategy
alteration is decided based upon the most recent
t and the stock prices at and before the term t-1
performance of each 5 term period after 25 terms
(Pt −1, Pt −2 , Pt −3 ). In the following, we represent have passed since the beginning of market trans-
the predicted values of the stock price and the actions. In addition, for the simplicity, investors
corporate profit by the investor i (i = 1, 2, 3 ) are assumed to decide to alter investment strategy
as Pty+,i1 and ytf+,i1, respectively. The expected with the higher cumulative excess return over at
most recent 5 terms.
rate of return on the stock for the investor i rtint,
+1
i
( ) Using the models, we have investigated the
is calculated as follows: roles of over-confidence agents and passive and
rint,i
( )
= rtim ⋅ c −1 ⋅ sts−1
−2
+r f ,i
( )
s
⋅ s
−2 −1
( )
⋅ c ⋅ st −1
s
−2
( )
+ sts−1
−2 −1
,
active strategies in the following two sections.
t +1
t +1 t −1
((
where rt f+,i1 = Pt +f ,1i + ytf+,i1 Pt − 1 ⋅ 1 + eti and ) )( ) Over-Confidence Agents
will Survive in a Market
( )
2
rtim = 2λ σts−1 Wt −1 + rf
(Black,Litterman,1992). First, we have analyzed the initial forecasting
model ratio where there was (1) a higher ratio of
Determination of Trading Prices fundamental forecasting, and (2) a higher ratio of
trend forecasting. As the results of this analysis
The traded price of the stock is determined at the have suggested that these higher ratio strength
price the demand meets the supply (Arthur,1997). the degree of overconfidence in both cases, then,
Both of the investment ratio wti and the number ( ) we have analyzed the random distribution of the
initial ratio of each forecasting model to determine
138
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
whether the same result could be obtained under the average value of past equity prices. In contrast,
different conditions. The results of this analysis overconfident investors survive in the market
are explained in detail below. even when a random initial value is applied for
the degree of overconfidence (Figures 6 and 7).
Searching for Investment Strategies This interesting analysis result suggests the
possibility of universality when survival trends
(a) When there is a High Ratio of Fundamental of overconfident investors are compared with the
Forecasting forecasting model.
139
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
140
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
transaction prices almost match the fundamental hOW PASSIVE AND ACTIVE
value in this case. INVESTORS BEhAVE
Traditional finance argues that market sur-
vival is possible for those investors able to The series of experiments on the behaviors of
swiftly and accurately estimate both the risk and passive and active investors are divided into the
rate of return on stock, achieving market effi- two parts: First, we have fundamentalist agents
ciency. However, analysis results obtained here and passive-investment agents in the market to
regarding the influence irrational investors have investigate the influences of the two strategies.
on prices suggests a different situation, pointing Next, in order to analyze the effects, we introduce
to the difficulty of market modeling which takes the other kinds of investors, such as trend chasers,
real conditions into account.
141
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
loss over estimation investors, or overconfidence mark represents the fundamental value. Figure
investors. 11 depicts the histories of cumulative excess
returns of each investor. This graph shows that
Trading with Fundamentalist the fluctuation of the traded price agrees with the
and Passive Investors one of the fundamental value. The line with mark
x in Figure 11 shows the performance of passive
Figures 10 and 11 illustrate the case where there investment strategy and the dotted line shows the
exist the same 500 numbers of the two kinds of ones of fundamentalists. The performances of
investors (Case 0). Figure 10 shows the histo- fundamentalists are slightly different among them,
ries of stock prices. The solid line in Figure 10 because each of fundamentalists respectively has
represents the traded price and the line with x a predicting error. As the traditional asset pricing
142
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
theory suggests, the trading prices are coincide and 15 are obtained by 100 experiments, each of
with the fundamental values and fundamentalist which consists of 3,000 simulation steps.
and passive investors can get the same profit in In Case 1, traded price changes in accordance
average. with fundamental value and both of investors
Next, using natural selection principles of coexist in the market. On the other hand, in Case
Genetic Algorithms (see the appendix 2 for detail), 2, traded price doesn’t reflect the fundamental
let the investor agents change their strategies when value and only passive investors can survive in
(1) the excess returns are under the target (e.g., the markets after around 1,600 time step. This
over 10%), (2) the excess returns are under 0%, result is quite different from the ones in Case 1.
and (3) the excess returns are too bad (e.g., under In Case 3, we have obtained the results similar to
10%). the ones in Case 2. These differences among each
The results of Case 1 are shown in Figures 12, experiment are brought about by the difference
13, 14, and 15. The results of Case 2 are shown of evaluation methods. In this sense, evaluation
in Figures 16, 17, 18, and 19. Figures 12, 13, 14, methods have a great influence on the financial
markets.
143
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
144
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
145
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
146
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
fident Investors. First, the results of Case 4 with want not to get the worst result in any cases, even
400 Fundamentalists, 400 trend chasers, and if they have failed to get the best result. In asset
200 passive investors are shown in Figures 20 management business, some investors adopt the
and 21. Second, the results of Case 5 with 400 passive investment to avoid getting the worst
Fundamentalists, 400 Over Confident Investors, performance.
and 200 passive investors are shown in Figures Figures 26 and 27 show the results where the
22 and 23. Third, the results of Case 6 with 400 agents are able to change their strategy when the
Fundamentalists, 400 Investors with Prospect excess returns are less than 0. In this experiment,
Theory, and 200 passive investors are shown in we have slightly modified the natural selection
Figures 24 and 25. rule as is described in previous section. In the
In all cases, we have observed that passive following experiments, investors change their
investors keep their moderate positions positive, strategy depending on their recent performance
even when stock prices largely deviate from the as is the same way in previous section and after
fundamental value. In other words, passive invest- that, investors change their strategy randomly in a
ment strategy is the most effective way if investors small possibility (0.01%) which is correspond to
147
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
148
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
149
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
150
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
strategy is valid in a realistic efficient market, Brunnermeier, M. K. (2001). Asset Pricing under
however, it could have bad influences such as Asymmetric Information. Oxford University Press.
instability of market and inadequate asset pricing doi:10.1093/0198296983.001.0001
deviations, and (5) under certain assumptions, the
Epstein, J. M., & Axtell, R. (1996). Growing
passive investment strategy and active investment
Artificial Societies Social Science From the The
strategy could coexist in a financial market. These
Bottom Up. MIT Press.
results have been described in more detail else-
where (e.g., Takahashi, Terano 2003, 2004, 2006a, Fama, E. (1970). Efficient Capital Markets: A Re-
2006b, Takahashi, Takahashi, Terano 2007). Us- view of Theory and Empirical Work. The Journal
ing a common simple framework presented in of Finance, 25, 383–417. doi:10.2307/2325486
the chapter, we have found various interesting
Friedman, M. (1953). Essays in Positive Econom-
results, which may or may not coincide with both
ics. University of Chicago Press.
financial theory and real world phenomena. We
believe that the agent-based approach would be Goldberg, D. (1989). Genetic Algorithms in
fruitful for the future research on social systems Search, Optimization, and Machine Learning.
including financial problems, if we would continue Addison-Wesley.
the effort to convince the effectiveness (Terano,
Kahneman, D., & Tversky, A. (1979). Prospect
2007a, 2007b; Takahashi, 2010).
Theory of Decisions under Risk. Econometrica,
47, 263–291. doi:10.2307/1914185
REFERENCES Kahneman, D., & Tversky, A. (1992). Advances
in. prospect Theory: Cumulative representation of
Arthur, W. B., Holland, J. H., LeBaron, B., Palmer, Uncertainty. Journal of Risk and Uncertainty, 5.
R. G., & Taylor, P. (1997). Asset Pricing under
Endogenous Expectations in an Artificial Stock Kyle, A. S., & Wang, A. (1997). Speculation
Market. [Addison-Wesley.]. The Economy as an Duopoly with Agreement to Disagree: Can Over-
Evolving Complex System, II, 15–44. confidence Survive the Market Test? The Journal
of Finance, 52, 2073–2090. doi:10.2307/2329474
Axelrod, R. (1997). The Complexity of Coop-
eration -Agent-Based Model of Competition and Levy, M., Levy, H., & Solomon, S. (2000).
Collaboration. Princeton University Press. Microscopic Simulation of Financial Markets.
Academic Press.
Axtell, R. (2000). Why Agents? On the Varied
Motivation For Agent Computing In the Social Markowitz, H. (1952). Portfolio Selection. The
Sciences. The Brookings Institution Center on Journal of Finance, 7, 77–91. doi:10.2307/2975974
Social and Economic Dynamics Working Paper, Modigliani, F., & Miller, M. H. (1958). The Cost
November, No.17. of Capital, Corporation Finance and the Theory
Bazerman, M. (1998). Judgment in Managerial of Investment. The American Economic Review,
Decision Making. John Wiley & Sons. 48(3), 261–297.
Black, F., & Litterman, R. (1992, Sept/Oct). Global Russell, S., & Norvig, P. (1995). Artificial Intel-
Portfolio Optimization. Financial Analysts Jour- ligence. Prentice-Hall.
nal, 28–43. doi:10.2469/faj.v48.n5.28
151
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
Sharpe, W. F. (1964). Capital Asset Prices: A Terano, T. (2007a). Exploring the Vast Parameter
Theory of Market Equilibrium under condition Space of Multi-Agent Based Simulation. In L.
of Risk. The Journal of Finance, 19, 425–442. Antunes & K. Takadama (Eds.), Proc. MABS
doi:10.2307/2977928 2006 (LNAI 4442, pp. 1-14).
Shiller, R. J. (2000). Irrational Exuberance. Terano, T. (2007b). KAIZEN for Agent-Based
Princeton University Press. Modeling. In S. Takahashi, D. Sallach, & J.
Rouchier (Eds.), Advancing Social Simulation
Shleifer,A. (2000). Inefficient Markets. Oxford Uni-
-The First Congress- (pp. 1-6). Springer Verlag.
versity Press. doi:10.1093/0198292279.001.0001
Terano, T., Deguchi, H., & Takadama, K. (Eds.).
Takahashi, H. (2010), “An Analysis of the Influ-
(2003), Meeting the Challenge of Social Problems
ence of Fundamental Values’ Estimation Accuracy
via Agent-Based Simulation: Post Proceedings of
on Financial Markets, ” Journal of Probability
The Second International Workshop on Agent-
and Statistics, 2010.
Based Approaches in Economic and Social
Takahashi, H., Takahashi, S., & Terano, T. (2007). Complex Systems. Springer Verlag.
Analyzing the Influences of Passive Investment
Terano, T., Nishida, T., Namatame, A., Tsumoto,
Strategies on Financial Markets via Agent-Based
S., Ohsawa, Y., & Washio, T. (Eds.). (2001). New
Modeling . In Edmonds, B., Hernandez, C., &
Frontiers in Artificial Intelligence. Springer Ver-
Troutzsch, K. G. (Eds.), Social Simulation- Tech-
lag. doi:10.1007/3-540-45548-5
nologies, Advances, and New Discoveries (pp.
224–238). Hershey, PA: Information Science Tesfatsion, L. (2002). Agent-Based Computational
Reference. Economics. Economics Working Paper, No.1,
Iowa Sate University.
Takahashi, H., & Terano, T. (2003). Agent-Based
Approach to Investors’ Behavior and Asset Price
Fluctuation in Financial Markets. Journal of
Artificial Societies and Social Simulation, 6(3). ENDNOTES
Takahashi, H., & Terano, T. (2004). Analysis 1
Graduate School of Business Administration,
of Micro-Macro Structure of Financial Markets Keio University, 4-1-1 Hiyoshi, Yokohama,
via Agent-Based Model: Risk Management and 223-8526, Japan, E-mail: [email protected].
Dynamics of Asset Pricing. Electronics and Com- ac.jp
munications in Japan, 87(7), 38–48. 2
Department of Computational Intelligence
Takahashi, H., & Terano, T. (2006a). Emergence and Systems Science, Tokyo Institute of
of Overconfidence Investor in Financial markets. Technology, 4259-J2-52 Nagatsuta-cho,
5th International Conference on Computational Midori-ku, Yokohama, 226-8502, Japan,
Intelligence in Economics and Finance. E-mail: [email protected]
152
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
APPENDICES
Fti :the total amount of assets of the investor i at the term t ( F0i =2,000:common)
wti :the investment ratio of the stock of the investor i at the term t (w 0i =0.5:constant)
sth :the historical volatility of the stock (for the recent 100 terms)
sn :the standard deviation of the dispersion of the short term expected rate of return on the stock
(0.01:common)
153
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
This section explains the rules of natural selection principle. The principle used in this chapter is com-
posed of two steps: (1) selection of investors who change their investment strategies and (2) selection
of new strategy. Each step is described in the following sections:
After 25 terms pass since the market has started, each investor makes decision at regular interval (every
five terms) whether he/she changes the strategy. The decision is made depending on the cumulative ex-
cess return during the recent five terms and the investors who obtain smaller return changes the strategy
at higher probability. To be more precise, the investors who obtain negative cumulative excess return
changes the strategy at the following probability:
( r cum
pi = max 0.3 − a ⋅ e i , 0 , )
pi : probability at which investor i changes own strategy,
We apply the method of genetic algorithm (Goldberg (1989)) to the selection rule of new strategy. The
investors who change the strategy tend to select the strategy that has brought positive cumulative excess
r cum rjcum
∑ j =1 e
M
return. The probability to select si as new strategy is given as: pi = e i , where ricum is
the cumulative excess return of each investor.
154
Section 4
Multi-Agent Robotics
156
Chapter 9
Autonomous Specialization in
a Multi-Robot System using
Evolving Neural Networks
Masanori Goka
Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura
Hiroshima University, Japan
ABSTRACT
Artificial evolution has been considered as a promising approach for coordinating the controller of an
autonomous mobile robot. However, it is not yet established whether artificial evolution is also effective
in generating collective behaviour in a multi-robot system (MRS). In this study, two types of evolving
artificial neural networks are utilized in an MRS. The first is the evolving continuous time recurrent neu-
ral network, which is used in the most conventional method, and the second is the topology and weight
evolving artificial neural networks, which is used in the noble method. Several computer simulations
are conducted in order to examine how the artificial evolution can be used to coordinate the collective
behaviour in an MRS.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
157
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
158
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
159
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
160
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
161
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
162
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Figure 7. The experimental results when P = 0 (the left three graphs) and P = ∆P (the right three graphs)
for eCTRNN
163
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
164
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
165
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
166
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
167
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
168
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Figure 13. The experimental results when P = 0 (the left three graphs) and P= ∆P (the right three
graphs) for MBEANN
169
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Figure 14. The achieved collective behavior by MBEANN at the last generation
Figure 15. A typical result of the structural of the neural network controller for MBEANN
170
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Figure 16. A typical result of the structural of the neural network controller for MBEANN
the robot group developed by artificial evolution. in a straight line, Robot No.3, No.4, and No.9
Figure 14 shows graphs for ten robots at the 500th pushed the midsize package, and the remaining
generation in the cases where P = 0 and P = ∆P. five robots pushed the largest package to the goal
These graphs are drawn in a manner similar to line.
those in Figure 8. As found in graph (a), the robot
groups showed almost the same team play as in
the cases of eCTRNN. On the other hand, as CONCLUSION
explained above, the effect of autonomous spe-
cialization is clearly observed in the graph (b). In this chapter, two approaches in EANN, called
Robot No.1 and No.2 pushed the smallest package eCTRNN and MBEANN, were discussed in the
171
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
context of the cooperative package pushing prob- Blynel, J., & Floreano, D. (2003). Exploring the
lem. Ten autonomous mobile robots successfully T-Maze: Evolving Learning-Like Robot Behav-
showed sophisticated collective behaviour as a iors using CTRNNs. In Proceedings of the 2nd
result of autonomous specialization. European Workshop on Evolutionary Robotics
As the next step, the extension of the algorithm (EvoRob’2003) (LNCS).
for artificial evolution in order to improve the
Cliff, D., Harvey, I., & Husbands, P. (1993). Explora-
success rate must be considered. Second, more
tions in Evolutionary Robotics . Adaptive Behavior,
computer simulations should be performed in
2(1), 71–104. doi:10.1177/105971239300200104
order to examine the validity and the robustness
of the ER approach in coordinating cooperative Gomez, F. J. and Miikkulainen, R. (1999). Solv-
behaviour in an MRS since the complexity of ing Non-Markovian Control Tasks with Neuro-
the cooperative package pushing problem can be evolution, In Proceedings of the International
easily varied by changing the number of robots Joint Conference on Artificial Intelligence (pp.
or the arrangement of the packages. 1356-1361).
Harvey, I., Di Paolo, E., Wood, A., & Quinn,
R., M., & Tuci, E. (2005). Evolutionary Ro-
REFERENCES
botics: A New Scientific Tool for Studying
Acerbi, A., et al. (2007). Social Facilitation on the Cognition. Artificial Life, 11(3/4), 79–98.
Development of Foraging Behaviors in a Popula- doi:10.1162/1064546053278991
tion of Autonomous Robots. In Proceedings of Harvey, I., Husbands, P., Cliff, D., Thompson,
the 9th European Conference in Artificial Life A., & Jakobi, N. (1997). Evolutionary robotics:
(pp. 625-634). The sussex approach. Robotics and Autonomous
Angeline, P. J., Sauders, G. M., & Pollack, J. Systems, 20, 205–224. doi:10.1016/S0921-
B. (1994). An evolutionary algorithms that 8890(96)00067-X
constructs recurrent neural networks. IEEE Liu, Y., & Yao, X. (1996). A Population-Based
Transactions on Neural Networks, 5, 54–65. Learning Algorithms Which Learns Both Ar-
doi:10.1109/72.265960 chitectures and Weights of Neural Networks.
Bäck, T. (1996). Evolutionary Algorithms in Chinese Journal of Advanced Software Research,
Theory and Practice: Evolution Strategies, Evo- 3(1), 54–65.
lutionary Programming, Genetic Algorithms. Mondada, F., & Floreano, D. (1995). Evolution
Oxford University Press. of neural control structures: Some experiments
Baldassarre, G., Nolfi, S., & Parisi, D. (2003). on mobile robots. Robotics and Autonomous
Evolving Mobile Robots Able to Display Collec- Systems, 16(2-4), 183–195. doi:10.1016/0921-
tive Behaviours . Artificial Life, 9(3), 255–267. 8890(96)81008-6
doi:10.1162/106454603322392460 Ohkura, K., Yasuda, T., Kawamatsu, Y., Matsumu-
Beer, R. D. (1996). Toward the Evolution of ra, Y., & Ueda, K. (2007). MBEANN: Mutation-
Dynamical Neural Networks for Minimally Cog- Based Evolving Artificial Neural Networks. In
nitive. In From Animals to Animats 4: Proceed- Proceedings of the 9th European Conference in
ings of the Fourth International Conference on Artificial Life (pp. 936-945).
Simulation of Adaptive Behavior (pp. 421-429).
172
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Quinn, M., & Noble, J. (2001). Modelling Animal Triani, V., et al. (2007). From Solitary to Collective
Behaviour in Contests: Tactics, Information and Behaviours: Decision Making and Cooperation,
Communication. In Advances in Artificial Life: In Proceedings of the 9th European Conference
Sixth European Conference on Artificial Life in Artificial Life (pp. 575-584).
(ECAL 01), (LNAI).
Yao, X. (1999). Evolving artificial networks.
Stanley, K., & Miikkulainen, R. (2002). Evolving Proceedings of the IEEE, 87(9), 1423–1447.
neural networks through augmenting topologies doi:10.1109/5.784219
. Evolutionary Computation, 10(2), 99–127.
doi:10.1162/106365602320169811
173
174
Chapter 10
A Multi-Robot System
Using Mobile Agents with
Ant Colony Clustering
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Yasuhiro Tsujimura
Nippon Institute of Technology, Japan
Hidemi Yamachi
Nippon Institute of Technology, Japan
Munehiro Takimoto
Tokyo University of Science, Japan
ABSTRACT
This chapter presents a framework using novel methods for controlling mobile multiple robots directed
by mobile agents on a communication networks. Instead of physical movement of multiple robots, mobile
software agents migrate from one robot to another so that the robots more efficiently complete their
task. In some applications, it is desirable that multiple robots draw themselves together automatically.
In order to avoid excessive energy consumption, we employ mobile software agents to locate robots
scattered in a field, and cause them to autonomously determine their moving behaviors by using a clus-
tering algorithm based on the Ant Colony Optimization (ACO) method. ACO is the swarm-intelligence-
based method that exploits artificial stigmergy for the solution of combinatorial optimization problems.
Preliminary experiments have provided a favorable result. Even though there is much room to improve
the collaboration of multiple agents and ACO, the current results suggest a promising direction for the
design of control mechanisms for multi-robot systems. In this chapter, we focus on the implementation
of the controlling mechanism of the multi-robot system using mobile agents.
DOI: 10.4018/978-1-60566-898-7.ch010
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
175
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
the host computer. In the third phase, a number of running. Dynamic extension of control software
mobile agents are issued from the host computer. by the migration of mobile agents enables the
Each mobile agent migrates to a designated robot, controlling agent to begin with relatively simple
and directs the robot to the assigned quasi-optimal base control software, and to add functionalities
position that was calculated in the second phase. one by one as it learns the working environment.
The assembly positions (clustering points) are Thus we do not have to make the intelligent robot
determined by the simulation agent. It is influ- smart from the beginning or make the robot learn
enced, but not determined, by the initial positions by itself. The controlling agent can send intelli-
of scattered robots. Instead of implementing ACC gence later through new agents. Even though the
with actual robots, one static simulation agent dynamic extension of the robot control software
performs the ACC computation, and then the set using the higher order mobile agents is extremely
of produced positions is distributed by mobile useful, such a higher order property is not neces-
agents. Therefore our method eliminates un- sary in our setting. We have employed a simple,
necessary physical movement and thus provides non-higher-order mobile agent system for our
energy savings. framework. We previously implemented a team
The structure of the balance of this chapter is of cooperative search robots to show the effec-
as follows. In the second section, we review the tiveness of such a framework, and demonstrated
history of research in this area. The third section that that framework contributes to energy savings
describes the agent system that performs the for a task achieved by multiple robots (Takimoto,
arrangement of the multiple robots. The fourth Mizuno, Kurio & Kambayashi, 2007; Nagata, Ta-
section describes the ACC algorithm we have kimoto & Kambayashi, 2009). We have employed
employed to calculate the quasi optimal assem- a simple non-high-order mobile agent system for
bly positions. The fifth section demonstrates the our framework. Our simple agent system should
feasibility of our system by implementing an achieve similar performance.
actual multi-robot orients itself using RFID tags Deneuburg formulated the biologically in-
in its environment. The sixth section discusses spired behavioral algorithm that simulates the ant
our quantitative experiments and our observa- corps gathering and brood sorting behaviors (De-
tions from the preliminary experiments. Finally, neuburg, Goss, Franks, Sendova-Franks, Detrain,
we conclude in the seventh section and discuss & Chretien, 1991). His algorithm captured many
future research directions. features of the ant sorting behaviors. His design
consists of ants picking up and putting down ob-
jects in a random manner. He further conjectured
BACKGROUND that robot team design could be inspired from the
ant corps gathering and brood sorting behaviors
Kambayashi and Takimoto have proposed a (Deneuburg, Goss, Franks, Sendova-Franks,
framework for controlling intelligent multiple Detrain, & Chretien, 1991). Wang and Zhang
robots using higher-order mobile agents (Kam- proposed an ant inspired approach along this line
bayashi & Takimoto, 2005; Takimoto, Mizuno, of research that sorts objects with multiple robots
Kurio & Kambayashi, 2007; Nagata, Takimoto & (Wang & Zhang, 2004).
Kambayashi, 2009). The framework helps users to Lumer improved Deneuburg’s model and
construct intelligent robot control software using proposed a new simulation model that was called
migration of mobile agents. Since the migrating Ant Colony Clustering (Lumer, & Faieta, 1994).
agents are of higher order, the control software His method could cluster similar object into a few
can be hierarchically assembled while they are groups. He presented a formula that measures the
176
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
similarity between two data objects and designed On the other hand, excessive interactions
an algorithm for data clustering. Chen et al have among agents in the multi-agent system may
further improved Lumer’s model and proposed cause problems in the multi-robot environment.
Ants Sleeping Model (Chen, Xu & Chen, 2004). Consider a multi-robot system where each robot
The artificial ants in Deneuburg’s model and is controlled by an agent, and interactions among
Lumer’s model have considerable amount of robots are achieved through a communication
random idle moves before they pick up or put network such as a wireless LAN. Since the cir-
down objects, and considerable amount of rep- cumstances around the robot change as the robots
etitions occur during the random idle moves. In move, the condition of each connection among the
Chen’s ASM model, an ant has two states: active various robots also changes. In this environment,
state and sleeping state. When the artificial ant when some of the connections in the network are
locates a comfortable and secure position, it has disabled, the system may not be able to maintain
a higher probability in sleeping state. Based on consistency among the states of the robots. Such a
ASM, Chen has proposed an Adaptive Artificial problem has a tendency to increase as the number
Ants Clustering Algorithm that achieves better of interactions increases.
clustering quality with less computational cost. In order to lessen the problems of excessive
Algorithms inspired by behaviors of social communication, mobile agent methodologies have
insects such as ants that communicate with each been developed for distributed environments. In
other by stigmergy are becoming popular (Dorigo the mobile agent system, each agent can actively
& Gambardella, 1996) and widely used in solving migrate from one site to another site. Since a mo-
complex problems (Toyoda & Yano, 2004; Becker bile agent can bring the necessary functionalities
& Szczerbicka, 2005). Upon observing real ants’ with it and perform its tasks autonomously, it can
behaviors, Dorigo et al found that ants exchanged reduce the necessity for interaction with other
information by laying down a trail of a chemical sites. In the minimal case, a mobile agent requires
substance (pheromone) that is followed by other that the connection be established only when it
ants. They adopted this ant strategy, known as performs migration (Binder, Hulaas & Villazon,
ant colony optimization (ACO), to solve vari- 2001). Figure 1 shows a conceptual diagram of a
ous optimization problems such as the traveling mobile agent migration. This property is useful for
salesman problem (TSP) (Dorigo & Gambardella, controlling robots that have to work in a remote
1996). Our ACC algorithm employs pheromone, site with unreliable communication or intermittent
instead of using Euclidian distance to evaluate its communication. The concept of a mobile agent
performance. also creates the possibility that new functions and
knowledge can be introduced to the entire multi-
agent system from a host or controller outside
ThE MOBILE AGENTS the system via a single accessible member of the
intelligent multi-robot system (Kambayashi &
Robot systems have made rapid progress in not Takimoto, 2005).
only their behaviors but also in the way they are Our system model consists of robots and a few
controlled (Murphy, 2000). Multi-agent systems kinds of static and mobile software agents. All
introduced modularity, reconfigurability and the controls for the mobile robots as well as ACC
extensibility to control systems, which had been computation performed in the host computer are
traditionally monolithic. It has made easier the achieved through the static and mobile agents.
development of control systems in distributed en- They are: 1) user interface agent (UIA), 2) op-
vironments such as intelligent multi-robot systems. eration agents (OA), 3) position collecting agent
177
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
(PCA), 4) clustering simulation agent (CSA), and entire agent system. When the user creates
5) driving agents (DA). All the software agents this agent with a list of IP addresses of the
except UIA and CSA are mobile agents. A mobile mobile robots, UIA creates PCA and passes
agent (PCA) traverses robots scattered in the field the list to it.
to collect their coordinates. After receiving the 2. Operation Agent (OA): Each robot has at
assembly positions computed by a static agent least one operation agent (OA). It has the
(CSA), many mobile agents (DAs) migrate to the task that the robot on which it resides is
robots and drive them to the assembly positions. supposed to perform. Each mobile robot has
Figure 2 shows the interactions of the cooperative its own OA. Currently all operation agents
agents to control a mobile robot. (OA) have a function for collision avoidance
The functionality of each agent is described and a function to sense RFID tags, which
as follows: are embedded in the floor carpet, to detect
its precise coordinates in the field.
1. User Interface Agent (UIA): The user in- 3. Position Collecting Agent (PCA): A distinct
terface agent (UIA) is a static agent that agent called the position collecting agent
resides on the host computer and interacts (PCA) traverses mobile robots scattered in
with the user. It is expected to coordinate the the field and to collect their coordinates.
178
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
PCA is created and dispatched by UIA. small range so that the position-collecting agent
Upon returning to the host computer, it hands can obtain fairly precise coordinates from the tag.
the collected coordinates to the clustering The robots also have a basic collision avoidance
simulation agent (CSA) for ACC. mechanism using infrared sensors.
4. Clustering Simulation Agent (CSA): The CSA is the other static agent whose sole role
host computer houses the static clustering is ACC computation. When CSA receives the
simulation agent (CSA). This agent actually coordinate data of all the robots, it translates them
performs the ACC algorithm by using the into coordinates for simulation, and performs the
coordinates collected by PCA as the initial clustering. When CSA finishes the computation
positions, and produces the quasi-optimal and produces a set of assembly positions, it then
assembly positions of the mobile robots. creates the set of procedures for autonomous
Upon terminating the computation, CSA robot movements.
creates a number of driving agents (DA). CSA creates DA that convey the set of pro-
5. Driving Agent (DA): The quasi-optimal cedures to the mobile robots. Each DA receives
arrangement coordinates produced by the its destination IP address from PCA, and the set
CSA are delivered by driving agents (DA). of procedures for the destination robot, and then
One driving agent is created for each mobile migrates to the destination robot. Each DA has a
robot, and it contains the set of procedures for set of driving procedures that drives its assigned
the mobile robot. The DA drives its mobile robot to the destination, while it avoids collision.
robot to the designated assembly position. OA has the basic collision detection and avoidance
procedures, and DA has task-specific collision
OA detects the current coordinates of the robot avoidance guidance, such as the coordinates of
on which it resides. Each robot has its own IP ad- pillars and how to avoid them.
dress and UIA hands in the list of the IP addresses We have implemented the prototype of the
to PCA. First, PCA migrates to an arbitrary robot multi-agent system for mobile robot control using
and starts hopping between them one by one. It Agent Space (Satoh, 1999). Agent Space is a li-
communicates locally with OA, and writes the brary for constructing mobile agents developed by
coordinates of the robot into its own local data area. Satoh. By using its library, the user can implement
When PCA gets all the coordinates of the robots, a mobile agent environment with Java language.
it returns to the host computer. UIA waits certain In Agent Space, mobile agents are defined
period for PCA’s return. If PCA does not hear from as collections of call-back methods, and we
PCA for certain period, it declares “time-out” and implement the contents of the methods with the
cancels the PCA. Then UIA re-generates a new interfaces defined in the system. In order to create
PCA with new identification number. On the other a mobile agent, the application calls the create
hand, if PCA can not find a robot with one of the method. An agent migrates to another site by us-
IP addresses on the list, it retries certain times and ing the move and leave methods. When an agent
then declares the missing robot to be “lost.” PCA arrives, the arrive method is invoked. Migration
reports that fact to UIA. Upon returning to the by an agent is achieved by its duplication of itself
host computer, PCA creates CSA and hands in the at the destination site. Thus the move and leave
coordinate data to CSA which computes the ACC methods are used as a pair of methods for actual
algorithm. We employ RFID (Radio Frequency migration. Figure 3 shows the move method for
Identification) tagging to get precise coordinates. example. The other methods are implemented
We set RFID tags in a regular grid shape under similarly. The users are expected to implement a
the floor carpet tiles. The tags we chose have a destructor to erase the original agent in the leave
179
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
method. Agent Space also provides services in its 4. The agent actually migrates to the specified
Application Program interface (API) such as the mobile robot.
move method to migrate agents and the invoke 5. Invoke arrive method in the destination
method to communicate to another agent. Figures robot, and the PCA communicates locally
4 and 5 show how PCA and DA work, respectively. to the OA in order to receive the coordinate
The following is the PCA implementation: of the robot.
6. Checks the next entry of the IP address list;
1. UIA invokes create method to create the if PCA visits all the mobile robots, it returns
mobile agent PCA, and hands in the list of to the host computer, otherwise migrates to
the IP addresses of mobile robots to PCA. the next mobile robot with the IP address of
2. PCA invokes move method so that it can the next entry in the list.
migrate to the mobile robot specified in the
top of the list of IP addresses. The followings are the DA implementation:
3. Invoke leave method.
180
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
1. CSA creates the mobile agents DA as many small clusters of objects. When a few clusters are
as the number of the mobile robots in the generated, those clusters tend to grow.
field. Since the purpose of traditional ACC is clus-
2. Each DA receives the IP address to where tering or grouping objects into several different
the DA is supposed to migrate, and the set classes based on selected properties, it is desirable
of the procedures to drive the robot. that the generated chunks of clusters grow into
3. The agents actually migrate to the specified one big cluster such that each group has distinct
mobile robots. characteristics. In our system, however, we want
4. Each DA invokes arrive method in the des- to produce several roughly clustered groups of
tination robot, constructs the sequence of the same type, and to minimize the movement
commands from the given procedures, and of each robot. (We assume we have one kind of
then communicates with the robot control cart robots, and we do not want robots move long
software called RCC (Robot Control Center) distances.) Therefore our artificial ants have the
in the notebook computer on the robot in following behavioral rules.
order to actually drive the mobile robot.
1. An artificial ant’s basic behavior is random
walk. When it finds an isolated object, it
ANT COLONY CLUSTERING picks it up.
2. When the artificial ant finds a cluster with
In this section, we describe our ACC algorithm certain number of objects, it tends to avoid
to determine the quasi-optimal assembly posi- picking up an object from the cluster. This
tions for multiple robots. Coordination of an ant number can be updated later.
colony is achieved through indirect communica- 3. When the artificial ant with an object finds
tion through pheromones. In previously explored a cluster, it put down the object so that the
ACO systems, artificial ants leave pheromone object is adjacent to one of the objects in the
signals so that other artificial ants can trace the cluster.
same path (Deneuburg, Goss, Franks, Sendova- 4. If the artificial ant cannot find any cluster
Franks, Detrain, & Chretien, 1991). In our ACC with certain strength of pheromone, it just
system, however, we have attributed pheromone to continues a random walk.
objects so that more objects are clustered in a place
where strong pheromone is sensed. The simula- By the restrictions defined in the above rules,
tion agent, CSA, performs the ACC algorithm as the artificial ants tend not to convey objects for a
a simulation. The field of the simulation has the long distance, and produce many small heaps of
coordinates of objects and their pheromone values, objects at the first stage. In order to implement the
so that the artificial ants can obtain all the neces- first feature, the system locks objects with certain
sary information (coordinates and pheromone) number of adjoining objects, and no artificial ant
for the simulation field. can pick up such a locked object. The number
Randomly walking artificial ants have a high for locking will be updated later so that artificial
probability of picking up an object with weak ants can bring previously locked objects in order
pheromone, and putting the object where they to create larger clusters. When the initially scat-
sense strong pheromone. They are designed not tered objects are clustered into small number of
to walk long distances so that the artificial ants heaps, the number of objects that causes objects
tend to pick up scattered objects and produce many to be locked is updated, and the activities of the
artificial ants re-start to produce smaller number
181
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
of clusters. We describe the behaviors of the ar- have any object is supposed to move straight in
tificial ants below. a randomly determined initial direction for ten
In the implementation of our ACC algorithm, steps at which time the ant randomly changes the
when the artificial ants are generated, they have direction. The ant also performs side steps from
randomly supplied initial positions and walking time to time to create further randomness.
directions. An artificial ant performs a random When the artificial ant finds an object during
walk; when it finds an unlocked object, it picks up its random walk, it determines whether to pick it
the object and continues random walk. During its up or not based on whether the object is locked or
random walk, when it senses strong pheromone, it not, and the strength of pheromone the object has,
puts down the conveyed object. The artificial ants according to the value of the formula (1) below.
repeat this simple procedure until the termination An artificial ant will not pick any locked object.
condition is satisfied. Figure 6 shows the behavior Whether an object is locked or not is also deter-
of an artificial ant. We explain several specific mined by the formula (1). Here, p is the density
actions that each artificial ant performs below. of pheromone and k is a constant value. p itself
The base behavior for all the artificial ants is is a function of the number of objects and the
the random walk. The artificial ant that does not distance from the cluster of objects. The formula
182
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
simply says that the artificial ant does not pick When an artificial ant picks up an object, it
up an object with strong pheromone. changes its state into “pheromone walk.” In this
state, an artificial ant tends probabilistically move
(p + l ) ∗ k toward a place it where it senses the strongest
f (p ) = 1 − (1)
100 pheromone. The probability that the artificial ant
takes a certain direction is n/10, where n is the
Currently, we choose p as the number of ad- strength of the sensed pheromone in that direction.
jacent objects. Thus, when an object is com- Figure 7 shows the strengths of pheromones and
pletely surrounded by other objects, p is the their scope. This mechanism causes the artificial
maximum value, set in our experiments to be nine. ants move toward the nearest cluster, and conse-
We choose k to be equal to thirteen in order to quently minimizes the moving distance.
prevent any object surrounded by other eight An artificial ant carrying an object determines
objects from being picked up. Then, the com- whether to put down the object or to continue to
puted value of f (p) = 0 (never pick it up). l is a carry it. This decision is made based on the for-
mula (2). Thus, the more it senses strong phero-
constant value at which an object is locked. Usu-
mone, the more it tends to put the carried object.
ally l is zero (not locked). When the number of
Here, p and k are the same as in the formula (1).
clusters becomes less than two third of the num-
The formula simply says when the artificial ant
ber of total objects and p is greater than three, we
bumps into a locked object; it must put the carried
set l to six. When the number of clusters becomes
object next to the locked object. Then, the value
less than one third of the number of total objects
of f (p) = 1 (must put it down).
and p is greater than seven, l becomes three. When
the number of clusters becomes less than the
p ∗k
number of the user setting and p is greater than f (p ) = (2)
nine, l becomes one. Any objects that meet these 100
conditions are deemed to be locked. This “lock” Conventional ACC algorithms terminate
process prevents artificial ants from removing when all the objects are clustered in the field, or
objects from growing clusters, and contributes to predefined number of steps are executed (Chen,
stabilizing the clusters’ relatively monotonic Xu & Chen, 2004). In such conditions, however,
growth.
183
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
the clustering may be over before obtaining sat- control application, called RCC (robot control
isfactory clusters. Therefore we set the terminate center), resides on each host computer. Our mobile
condition of our ACC algorithm that the number agents communicate with RCC to receive sensor
of resulted clusters is less than ten, and all the data. Figure 8 shows a team of mobile multiple
clusters have three or more objects. This condition robots working under control of mobile agents.
may cause longer computation time than usual the In the previous implementation, an agent on
ACC, but preliminary experiments show that this the robot calculates the current coordinates from
produces reasonably good clustering. the initial position, and determines where com-
puted coordinates are different from actual posi-
tions (Ugajin, Sato, Tsujimura, Yamamoto, Taki-
ThE ROBOTS moto & Kambayashi, 2007). The current
implementation employs RFID (Radio Frequen-
In this section, we demonstrate that the model of cy Identification) to get precise coordinates. We
static and mobile agents with ant colony cluster- set RFID tags in a regular grid shape under the
ing (ACC) is suitable for intelligent multi-robot floor carpet tiles. The tags we chose have a small
systems. We have employed the ER1 Personal range so that the position-collecting agent can
Robot Platform Kit by Evolution Robotics Inc. as obtain fairly precise coordinates from the tag.
the platform for our prototype (Evolution Robotics, Figure 9 shows the RFID under a floor carpet tile.
2008). Each robot has two servomotors with tires. The robot itself has a basic collision avoidance
The power is supplied by a rechargeable battery. mechanism using infrared sensors.
It has a servomotor controller board that accepts For driving robots along a quasi-optimal route,
RS-232C serial data from a host computer. Each one needs not only the precise coordinates of each
robot holds one notebook computer as its host robot but also the direction each robot faces. In
computer. Our control mobile agents migrate to order to determine the direction that it is facing,
these host computers by wireless LAN. One robot each robot moves straight ahead in the direction
184
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
Figure 9. RFID under a carpet tile Figure 10. RFID tags in square formation
it is currently facing and obtains two positions ally a small degree). Once OA obtains the current
(coordinates) from RFID tags under the carpet position, it drives the robot a short distance until
tiles. Determining current orientation is important the RFID module detects the second RFID tag.
because there is a high cost for making a robot Upon obtaining two positions, a simple computa-
rotates through a large angle. It is desirable for tion determines the direction in which the robot
each robot be assigned rather simple forward is moving, as shown in Figure 11.
movements rather than complex movement with In a real situation, we may find the robot close
several direction-changes when there are obstacles to the wall and facing it. Then we cannot use the
to avoid. Therefore whenever OA is awake, it simple method just described above. The robot
performs the two actions, i.e. obtaining the current may move forward and collide with the wall. The
position and calculating the current direction. Robot has two infrared sensors, but their ranges
Since each carpet tile has nine RFID tags, as are very short and when they sense the wall, it is
shown in Figure 10, the robot is supposed to obtain often too late to compute and execute a response.
the current position as soon as OA gets the sensor In order to accommodate such situations, we make
data from the RFID module. If OA can not obtain RFID tags near the wall have a special signal to
the position data, it means the RFID module can the robot that tells it that it is at the end of the
not sense a RFID tag, OA makes the robot rotate field (near the wall) so that the robot that senses
until the RFID module senses a RFID tag (usu- the signal can rotate to the opposite direction, as
185
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
186
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
Figure 13. Clustered objects constructed by assembling at positions computed by ACC and at predefined
positions
187
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
clusters converge into a few big clusters. Then agents carrying the requisite set of procedures
the artificial ants can see ten to twenty positions migrate to the mobile robots, and so direct the
ahead. As a result of our experiments, we realize robots using the sequence of the robot control
that we need to continue to improve our ACC commands constructed from the given set of
algorithm. procedures.
Since our control system is composed of sev-
eral small static and mobile agents, it shows an
CONCLUSION AND excellent scalability. When the number of mobile
FUTURE DIRECTIONS robots increases, we can increase the number of
mobile software agents to direct the mobile ro-
We have presented a framework for controlling bots. The user can enhance the control software
mobile multiple robots connected by communica- by introducing new features as mobile agents
tion networks. Mobile and static agents collect the so that the multi-robot system can be extended
coordinates of scattered mobile multiple robots dynamically while the robots are working. Also
and implement the ant colony clustering (ACC) mobile agents decrease the amount of the necessary
algorithm in order to find quasi-optimal positions communication. They make mobile multi-robot
to assemble the mobile multiple robots. Making applications possible in remote site with unreli-
mobile multiple robots perform the ant colony able communication. In unreliable communication
optimization is enormously inefficient. Therefore environments, the multi-robot system may not be
a static agent performs the ACC algorithm in its able to maintain consistency among the states of
simulator and computes the quasi-optimal posi- the robots in a centrally controlled manner. Since
tions for the mobile robots. Then other mobile a mobile agent can bring the necessary functional-
188
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
ities with it and perform its tasks autonomously, it any one step of each ant’s behavior is simple, we
can reduce the necessity for interaction with other can assume it takes constant execution time. Even
sites. In the minimal case, a mobile agent requires though apparently obvious, we need to confirm
that the connection be established only when it this with quantitative experiments.
performs migration (Binder, Hulaas & Villazon, One big problem is to determine how we
2001). The concept of a mobile agent also creates should include the collision avoidance behaviors
the possibility that new functions and knowledge of robots into the simulation. We need to quantify
can be introduced to the entire multi-agent system real robot movements more completely. Col-
from a host or controller outside the system via a lision avoidance itself is a significant problem
single accessible member of the intelligent multi- because the action of clustering means create a
robot system (Kambayashi & Takimoto, 2005). jam of moving robots. Each driving agent must
While our imaginary application is simple cart maneuver its robot precisely to the destination
collection, the system should have a wide variety while avoiding colleague robots as it dynamically
of applications. determines its destination coordinates. This task
We have implemented a team of mobile robots requires much more intelligence we had expected
to show the feasibility of our model. In the cur- early in this project.
rent implementation, an agent on the robot can During the experiments, we experienced the
obtain fairly precise coordinates of the robots following unfavorable situations:
from RFID tags.
The ACC algorithm we have proposed is 1. Certain initial arrangements of objects causes
designed to minimize the total distance objects very long periods for clustering,
are moved. We have analyzed and demonstrated 2. If one large cluster is created at the early
the effectiveness of our ACC algorithm through stage of the clustering and the rest of the
simulation, performing several numerical ex- field has scarce objects, then all the objects
periments with various settings. Although we are assembled into one large cluster. This
have so far observed favorable results from the situation subsequently makes aggregate
experiments in the simulator, applying the results moving distance long, and
of the simulation to a real multi-robot system is 3. As a very rare case, the simulation does not
difficult. Analyzing the results of the simulation, converge.
we often find the sum of the moving distances
of all the robots is not minimal as we expected. Even though such cases are rare, these phe-
We have re-implemented the ACC algorithm to nomena suggest further avenues for research
use only the sum of moving distances and have for our ACC algorithm. As we mentioned in the
found some improvement. Even though we believe previous section, we need to design the artificial
that the multi-agent framework for controlling ants have certain complex features that changes
multi-robot systems is a right direction, we have their ability to adapt to circumstances. We defer
to overcome several problems before constructing this investigation to our future work.
a practical working system. On the other hand, when certain number of
Compared with the time for robot movements, clusters have emerged and stabilize, we can
the computation time for the ACC algorithm is coerce them into several (three or four) clusters
negligible. Even if the number of artificial ants by calculating the optimal assembly points. This
increases, the computation time will increase coercion to required assembly points should be
linearly, and the number of objects should not one of the other directions for our future work.
influence the computation’s complexity. Because We may also investigate computing the number
189
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
of clusters and their rough positions prior to per- Dorigo, M., & Gambardella, L. M. (1996). Ant
forming the ACC algorithm, so that we can save Colony System: a Cooperative Learning Approach
much computation time. In many ways, we have to the Traveling Salesman . IEEE Transactions
room to improve our assembly point calculation on Evolutionary Computation, 1(1), 53–66.
method before integrating everything into one doi:10.1109/4235.585892
working multi-robot system.
Evolution Robotics Ltd. Homepage (2008). Re-
trieved from http://www.evolution.com/
ACKNOWLEDGMENT Kambayashi, Y., Sato, O., Harada, Y., & Taki-
moto, M. (2009). Design of an Intelligent Cart
We appreciate Kimiko Gosney who gave us useful System for Common Airports. In Proceedings
comments. This work is partially supported by Ja- of 13th International Symposium on Consumer
pan Society for Promotion of Science (JSPS), with Electronics. CD-ROM.
the basic research program (C) (No. 20510141),
Kambayashi, Y., & Takimoto, M. (2005). Higher-
Grant-in-Aid for Scientific Research.
Order Mobile Agents for Controlling Intelligent
Robots. International Journal of Intelligent In-
formation Technologies, 1(2), 28–42.
REFERENCES
Kambayashi, Y., Tsujimura, Y., Yamachi, H., Ta-
Becker, M., & Szczerbicka, H. (2005). Parameters kimoto, M., & Yamamoto, H. (2009). Design of
Influencing the Performance of Ant Algorithm Ap- a Multi-Robot System Using Mobile Agents with
plied to Optimisation of Buffer Size in Manufac- Ant Colony Clustering. In Proceedings of Hawaii
turing. Industrial Engineering and Management International Conference on System Sciences.
Systems, 4(2), 184–191. IEEE Computer Society. CD-ROM
Binder, W. J., Hulaas, G., & Villazon, A. (2001). Lumer, E. D., & Faieta, B. (1994). Diversity and
Portable Resource Control in the J-SEAL2 Mobile Adaptation in Populations of Clustering Ants. In
Agent System. In Proceedings of International From Animals to Animats 3: Proceedings of the
Conference on Autonomous Agents (pp. 222-223). 3rd International Conference on the Simulation
Chen, L., Xu, X., & Chen, Y. (2004). An adaptive of Adaptive Behavior (pp. 501-508). Cambridge:
ant colony clustering algorithm. In Proceedings of MIT Press.
the Third IEEE International Conference on Ma- Murphy, R. R. (2000). Introduction to AI robotics.
chine Learning and Cybernetics (pp. 1387-1392). Cambridge: MIT Press.
Deneuburg, J., Goss, S., Franks, N., Sendova- Nagata, T., Takimoto, M., & Kambayashi, Y.
Franks, A., Detrain, C., & Chretien, L. (1991). (2009). Suppressing the Total Costs of Executing
The Dynamics of Collective Sorting: Robot-Like Tasks Using Mobile Agents. In Proceedings of the
Ant and Ant-Like Robot. In Proceedings of First 42nd Hawaii International Conference on System
Conference on Simulation of Adaptive Behavior: Sciences, IEEE Computer Society. CD-ROM.
From Animals to Animats (pp. 356-363). Cam-
bridge: MIT Press.
190
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
Sato, O., Ugajin, M., Tsujimura, Y., Yamamoto, H., KEY TERMS AND DEFINITIONS
& Kambayashi, Y. (2007). Analysis of the Behav-
iors of Multi-Robots that Implement Ant Colony Mobile robot: In contrast to an industrial
Clustering Using Mobile Agents. In Proceedings robot, which usually consist of a multi-linked
of the Eighth Asia Pacific Industrial Engineering manipulator and an end effecter that is attached
and Management System. CD-ROM. to a fixed surface, a mobile robot has the capabil-
ity to move around in its environment. “Mobile
Satoh, I. (1999). A Mobile Agent-Based Frame- robots” often implies autonomy. Autonomous
work for Active Networks. In Proceedings of robots can perform desired tasks in unstructured
IEEE Systems, Man, and Cybernetics Conference environments with minimal user intervention.
(pp. 161-168). Mobile agent: A piece of program that can
Takimoto, M., Mizuno, M., Kurio, M., & Kam- migrate from a computational site to another
bayashi, Y. (2007). Saving Energy Consumption computational site while it is under execution.
of Multi-Robots Using Higher-Order Mobile Multi-robots: A set of mobile robots. They
Agents. In Proceedings of the First KES Inter- are relatively small and expected to achieve given
national Symposium on Agent and Multi-Agent tasks by cooperating with each other.
Systems: Technologies and Applications (LNAI Intelligent robot control: A method to control
4496, pp. 549-558). mobile robots. This method allows mobile robots
to behave autonomously reducing user interven-
Toyoda, Y., & Yano, F. (2004). Optimizing Move- tions to a minimum.
ment of a Multi-Joint Robot Arm with Existence Swarm intelligence: the property of a system
of Obstracles Using Multi-Purpose Genetic Algo- whereby the collective behaviors of (mobile)
rithm. Industrial Engineering and Management agents interacting locally with their environ-
Systems, 3(1), 78–84. ment cause coherent functional global patterns
Ugajin, M., Sato, O., Tsujimura, Y., Yamamoto, to emerge. A swarm has been defined as a set of
H., Takimoto, M., & Kambayashi, Y. (2007). In- (mobile) agents which are liable to communicate
tegrating Ant Colony Clustering Method to Multi- directly or indirectly with each other, and which
Robots Using Mobile Agents. In Proceedings of collectively carry out a distributed problem solv-
the Eigth Asia Pacific Industrial Engineering and ing.
Management System. CD-ROM. Ant colony optimization: A probabilistic tech-
nique inspired by the behaviors of social insects
Wang, T., & Zhang, H. (2004). Collective Sorting “ants.” It was proposed as a method for solving
with Multi-Robot. In Proceedings of the First hard combinatorial optimization problems, and
IEEE International Conference on Robotics and is known to be useful for solving computational
Biomimetics (pp. 716-720). problems which can be reduced to finding good
paths through graphs.
Clustering algorithm: An algorithm that
extracts similar data items from unstructured data
and group them into several clusters.
191
Section 5
Multi-Agent Games and
Simulations
193
Chapter 11
The AGILE Design of
Reality Game AI
Robert G. Reynolds
Wayne State University, USA
John O’Shea
University of Michigan-Ann Arbor, USA
Xiangdong Che
Wayne State University, USA
Yousof Gawasmeh
Wayne State University, USA
Guy Meadows
University of Michigan-Ann Arbor, USA
Farshad Fotouhi
Wayne State University, USA
ABSTRACT
This chapter investigates the use of agile program design techniques within an online game develop-
ment laboratory setting. The proposed game concerns the prediction of early Paleo-Indian hunting sites
in ancient North America along a now submerged land bridge that extended between Canada and the
United States across what is now Lake Huron. While the survey of the submerged land bridge was being
conducted, the online class was developing a computer game that would allow scientists to predict where
sites might be located on the landscape. Crucial to this was the ability to add in gradually different levels
of cognitive and decision-making capabilities for the agents. We argue that the online component of the
courses was critical to supporting an agile approach here. The results of the study indeed provided a
fusion of both survey and strategic information that suggest that movement of caribou was asymmetric
over the landscape. Therefore, the actual positioning of human artifacts such as hunting blinds was
designed to exploit caribou migration in the fall, as is observed today.
DOI: 10.4018/978-1-60566-898-7.ch011
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
The AGILE Design of Reality Game AI
194
The AGILE Design of Reality Game AI
occupation, both archaeologically and via the game Then, in section 4 the basic steps in the layering of
play. It was of interest to see how the two sources the social intelligence using Cultural Algorithms
related to each other at the end of the term. The into the system through laboratory and lecture as-
course was set up to allow the gradual layering of signments is given. Section 5 provides an example
intelligence into each of the major agents, caribou of how the project developed within this context.
and humans through the support of an explicitly Section 6 concludes the chapter by assessing how
agile methodology. In fact, we felt that the on-line the project results correspond to the survey results
aspect of the course would play an important role and describes future work.
in the application of the methodology since it will
facilitate student communication and allow the
archaeologists to have access to class discussions, ThE LAKE STANLEY LAND BRIDGE
demonstrations, and to provide feedback. Given
the physical separation of the data collection and Overview
the software development sites, a face to face ap-
proach was not possible. In this chapter, we use It is difficult, if not impossible, to consider the
on-line technology to substitute for this face to character of the Paleo-Indian and Early Archaic
face interaction required with the Agile Develop- occupation of the Great Lakes region without
ment methodology. Our goal will be to see if this reference to the grudging withdrawal of the
approach can still support the agile development continental ice sheet, and the subsequent rises
of a real-world game for this application. and drops in the waters of the Great Lakes that
In section 2 the Land Bridge project is briefly accompanied the region’s gradual transition to its
described. Section 3 describes how the 12 basic modern appearance. Archaeologists have used the
tenets of agile programming, its manifesto, is sequence of high water beaches to date these early
supported within the on-line course framework. sites, although with few exceptions the sites have
Figure 1. Bathymetry of Lake Huron. The land bridge is bright yellow and labeled the Alpena Amberly
Ridge on the map.
195
The AGILE Design of Reality Game AI
been thoroughly disturbed by later land use, and The Survey and Data
their faunal assemblages have been dissolved by Collection Methodology
the acid forest soils. For sites associated with the
periods of low water, there are often no surface The post-glacial history of the Great Lakes is
sites to be found at all. Some can be found deeply characterized by a series of high and low water
buried beneath later lake sediments, and many stands produced by the interaction of early Ho-
more are presumed lost forever, somewhere out locene climate, the flows of glacial melt waters
under the lakes. and the isostatic rebound of recently deglaciated
Newly released high resolution bathometry land surfaces (Lewis, et al., 1994; Moore, Rea,
of the Great Lakes, coupled with advances in Mayer, Lewis, & Dobson, 1994; Larsen, 1999) as
3-D surface modeling, make it possible to once shown in Figure 2 The most extreme of the low
again view the ancient landforms from these low water stands in the Lake Huron basin is referred
water periods. This in turn raises the possibility to as the Lake Stanley stage (Lake Chippewa in
of discovering the early human settlements and the Lake Michigan basin), which spanned roughly
activity sites that existed in this environment. 10,000 to 7500BP and was associated with lake
In this project, we seek to explore this potential levels as low as 80-90m amsl (compared to the
with respect to the earliest of the major low water modern standard of 176 masl (meters above sea
stands in the Lake Huron basin, Lake Stanley, and level)) (Lewis et al., 2007).
the unique causeway structure that once linked When projected at these levels the Lake Huron
Michigan with Ontario. This causeway or “land basin contains two lakes separated by a ridge or
bridge” would have been available for coloniza- causeway extending northwest to southeast across
tion by plant, animal, and human communities. the basin from the area of Presque Isle, Michigan
It would have also been a vehicle to support the to Point Clark in Ontario. The causeway, termed
movement of animals, specifically caribou, across the Alpena – Amberley Ridge, averages 16 km in
Lake Huron during spring and fall migrations. width (Figure 1) and is capped with glacial till
and Middle Devonian limestone and dolomite
(Hough, 1958; Thomas, Kemp, & Lewis, 1973).
It is represented via a topographic map where the
Figure 2. The history of the ancient Great lakes. As the glaciers retreated a precursor to Lake Huron
was formed, Lake Stanley around 9000 B.B.
196
The AGILE Design of Reality Game AI
three targeted survey regions are highlighted by in this chapter was collected by a surface-towed
boxes. Region 3 in the middle of the three was side scan sonar and remote operated vehicles
the target for the first surveys here. (ROVs). An example is shown in figure 3 below.
The earliest human occupation in the upper The side scan survey was conducted using a digital
Great Lakes is associated with a regional fluted side scan sonar unit (Imagenex) at a frequency of
point Paleo-Indian tradition which conventionally 330 kHz and a depth of 30m, mapping overlapping
ends at the start of the Lake Stanley low water swaths of roughly 200m. Targets of interest, identi-
stage (Ellis, Kenyon, & Spence, 1990; Monaghan fied from acoustic survey, were examined using
& Lovis, 2005; Shott, 1999). The terminal Paleo- a remote operated vehicle (ROV). The current
Indian and Early and Middle Archaic populations work utilized two mini-ROVs, a SeaBotix LBV
that inhabited the region during Lake Stanley 150, and an Outland 1000, which can be manu-
times would have experienced an environment ally deployed from a small craft. Two pilot search
that was colder and drier than present with a areas have been covered, representing a total area
spruce dominated forest (Croley & Lewis, 2006; of 72 sq km, at depths ranging from 12 to 150m.
Warner, Hebda, & Hahn, 1984). Sites associated Based upon the results of the current survey
with these time periods are rare. While some are and the corresponding reality game we hope to
found preserved beneath meters of later lake sedi- acquire sufficient information to motivate a more
ment (Lovis, 1989), it is generally assumed that detailed search using autonomous underwater
most were lost as Lake Huron rose to its modern vehicles (AUVs) and direct observation by ar-
levels. Here we report on the first evidence for chaeologists using SCUBA. The next section will
human activity on the Alpena-Amberley Land provide an overview of the software development
Bridge; a structure that during the Lake Stanley methodology that we used to develop the research
low water phase would have provided a land con- game here.
nection across the middle of modern Lake Huron
linking northern Michigan with central Ontario.
Archaeologists have long recognized the ThE AGILE METhODOLOGY
potential for discovering sites of Pleistocene
and early Holocene age in coastal areas that ex- Why Agile?
perienced repeated exposure and submergence,
although these efforts have typically focused on Boehm and Turner (2004) suggest that the “home
marine environments that were subject to global ground” for the use of Agile Program Design
changes in sea level. During the past year, inves- methodologies can be described using in terms
tigators from the Museum of Anthropology and of the following factors. First, the system to be
the Marine Hydrodynamics Laboratory at the developed has low criticality relative to existing
University of Michigan have begun the task of systems. Secondly, it involves the use of senior
testing whether human occupation sites are pres- developers. The third is that the requirements
ent on the Alpena-Amberley Ridge beneath Lake change often. Fourthly the numbers of developers
Huron. A particularly tantalizing possibility is the is small, 5 to 10. Lastly, the target project domain
potential that stone constructions, such as caribou is one that thrives on “chaos”. That is, it is a project
drive lanes, hunting blinds and habitation sites, of of high complexity.
a kind only preserved in subarctic regions, might All of the factors are true in this case. The
be visible on the lake bottom. system has a low criticality since there is no exist-
To discover sites within this setting, a multilay- ing system that depends directly on its production.
ered search strategy was developed. The data used Also, the survey is currently independent of the
197
The AGILE Design of Reality Game AI
results. Thus, results of the software are not as the expert user. Developers communicate indi-
needed at this point to determine survey agendas. rectly with him via the video-streaming device.
Since this study is in a sense one of a kind, there Lectures and project discussions are recorded and
is very little a priori knowledge about what to the web links are given to all parties for reference.
expect from the survey, as well as what the agents Reynolds was the course instructor, and with a
based game should generate. research position in the Museum of Anthropol-
As for the fourth point, the develop group is ogy he was able to facilitate a dialogue with the
small, consisting of senior undergraduates and expert and the student developers.
graduate students. In addition, O’Shea functioned
Figure 4. A topographic map of the Lake Stanley Causeway extending from the mitten portion of Michigan
on the left to Ontario on the right. The three target survey regions along the causeway are highlighted.
198
The AGILE Design of Reality Game AI
Since this is a research project that by defi- was on average one week. Each new assignment
nition is a “high risk” endeavor there is much added complexity to the previous assignment in a
uncertainty about what can possibly be extracted gradual fashion. How the layering of complexity
from these surveys, and inferred from the game was performed here is described in section 4. The
program activities. So things tend to change from course instructor evaluated the submissions and
day to as the survey commenced. This was in fact passed the evaluated work and questions along to
an exciting thing for the students to participate the expert through twice weekly videos and one
in. Between the lecture and lab there were 14 weekly face to face meeting.
assignments, about one a week which meant a
constant work load for the students. However, Frequently Delivered Software
all 10 students, both online and in class stayed on
to complete the project. This is quite an unusual The goal was to have a working program each
occurrence since during the course of the term week that contained a new feature in addition
there are many events that occur that are beyond to features from previous ones. The online
the control of the student, yet they all delivered component allowed the instructor to critique
the final project. submissions in terms of the requirements. This
resulted in students “tracking” these issues and
Supporting the Agile Approach incorporating the desired features into subsequent
with Online Technology programs. This resulted in undesirable features
being eliminated quickly throughout the group.
Wood and Kleb demonstrated that agile meth- While each programmer did their own program
odologies, specifically Xtreme Programming, they were able to share utility modules and graph-
developed primarily to support a business ics, which encouraged desirable features to spread.
customer-developer relationship can be extended One student, for example came up with a sound
to research based projects (Wood & Kleb, 2002). package that he distributed.
Here we extend the application of agile method-
ologies to developing research based programs Working Software is the Principle
using an on-line course environment within an Measure of Progress
academic setting. We feel that the addition of an
online component to a game development course The language for the class was Python, a text based
enhances the ability of the course to support an scripting language. The emphasis was on making
agile development methodology. the code as much as possible self documenting.
The best way to demonstrate how this enhances While some documentation was required such as
the application of agile methods is to describe how object-oriented diagrams etc. emphasis during the
each of the basic features of the “Agile manifesto evaluations done in class was on performance,
(Ambler, 2008) are supported in the classroom code quality, and readability. This made delivering
here. The features and their implementation are a new program a week feasible for a student, given
as follows: the reduced emphasis on additional documenta-
tion. The online component allowed the instructor
Rapid Continuous Delivery to execute the submissions and provide feedback to
of Useful Software students during the lecture as part of the critique.
199
The AGILE Design of Reality Game AI
The video streaming component of the course Professor O’Shea is an expert in Great Lakes un-
supported the change of requirements. New derwater archaeology and an enthusiastic advocate
requirements can be motivated and described as for the education of students about Michigan pre-
part of the lecture, and students can go back and history. This rubbed off on the students through the
review the lecture video if they have problems. papers and documents that were provided as well.
200
The AGILE Design of Reality Game AI
201
The AGILE Design of Reality Game AI
line, to avoid obstacles, wolves, hunters, etc, and Learning capabilities: Cultural Algorithms
to separate into more than one herd if they were have been successfully used to acquire social intel-
attacked by the hunters or the wolves. ligence in a variety of application and will be used
Basic goals: Caribou, wolves, and humans all to support organizational learning here (Reynolds
have a goal of a survival which requires a certain & Ali, 2008; Reynolds, Ali, & Jayyousi, 2008).
amount of food, water, and other resources daily.
Humans for example will also need firewood for The Terrain Object
cooking and warmth, as well as stone for tool
making. At the onset of the project, there was some in-
Path Planning: A basic set of waypoints are formation about the general topography of the
located within the region. Individual agents from region as shown by the computer generated GIS
each category, human, caribou, or wolf can plan representation in Figure 6. However, prior to the
their path in order to avoid obstacles, moving in survey it was not clear what level of detail might
herds, and to achieve goals such as attaining food actually be found on the bottom. That is, will
and water. there be evidence of non-natural structures made
State Machines: Each category of agents by humans as well as other evidence suggesting
inherits a state machine from its class. The state the past distribution of resources. As the survey
machine keeps track of its current state in terms proceeded it became clear that there was sufficient
of its food and resource needs, its’ active goals, evidence for campsite and hunting blinds to al-
and its perceived environment. low them to be included in the model, as well as
202
The AGILE Design of Reality Game AI
behaviors that would utilize them. Therefore, as the game; and “Exit” to close the game applica-
more information was garnered from the survey, tion. The options screen gives the user a chance
the requirements for the game changed since there to configure the game and to change the difficulty
was now more information that the game could level of the game. The menu has four main op-
be used to predict. tions that can be useful in the modification of the
At the beginning of the game design, hunting number of wolves, caribou, and hunters.
blinds and campsites were not considered but
once it was clear that there was sufficient evidence
for them they were added into the game and the RESULTS
objects behaviors adjusted to include them. With
an agile approach, synchronizing the project re- Figure 8 gives a screen shot of the game environ-
quirements with the survey results would have ment containing spruce forest, water features,
been very difficult to achieve. And, without the rock formations along with hunting blinds and
online component it would have been difficult to campsites. One can see caribou herds, hunters,
effectively implement the agile technology in a and wolves distributed over the landscape as well.
situation that lacked aspects of collocation that In this figure wolves are attacking the caribou.
are critical to traditional agile approaches. Hunters are running to find ambush sites ahead
When the development of the game first began of the herd. Notice the emergent property of the
campsites and hunting blinds were not considered herd to split into smaller herds as result of an at-
to be used. The preliminary results of the survey tack. This behavior is observed in real caribou and
suggested that remains of such structures were emerged here as a result of the interaction of herd
present, so we added in those features along with members. The semi-circle of rocks corresponds
the AI required to exploit them. We then can to a man-made hunting blind.
observe how adding this new information in can Figure 9 presents another emergent behavior
affect the behavior of the various agents. Figure produced by the model. In the figure the herd of
7 gives a screen shot of main menu of the game caribou moves in a line, with one individual fol-
containing three options. The three options are as lowing another other. This behavior is observed
follows: “Play Game” to load and start playing in real caribou and merges here as a result of the
the game; “Options” to change the difficulty of individual movements of the herd members.
Figure 6. A GIS (geographical information system) Figure 7. A screen shot of the main menu of the
representation of the topography of the land bridge game. The user can start playing the game, change
the difficulty level, and exit the game.
203
The AGILE Design of Reality Game AI
Figure 8. A screen shot of the game as caribou move from the northwest to the south east along a por-
tion of the land bridge
Figure 10 is a screen shot of the caribou herd This emergent asymmetry provides us with
moving in a line as they head south while avoid- new information that we can use to answer the
ing the water and rocks. They are also able to following questions:
avoid hunters and wolves. This avoidance of
objects gives the hunters an opportunity to produce 1. Is there a positioning of blinds and sites that
drive lanes. These lanes force the herd into a exhibits equal kill probabilities for both north
linear pattern within a narrowing area which makes and south migrations?
hunting easier. 2. How does this compare to the optimal posi-
tioning blinds and sites for north migration
or south migration alone?
Figure 9. In this screen shot the caribou are moving in a line across the land bridge. Notice that caribou
avoid the hunters in their path.
204
The AGILE Design of Reality Game AI
Figure 10. The caribou herd moves along the water’s edge while avoiding the hunter
3. If the north and south optimal locations are us to infer the season in which the blinds were
different, what are the differences and why? most useful. As it turned out, blind placement
4. For an observed archaeological configuration was significantly more effective in north to south
of sites and blinds on the land bridge can we migration rather than from south to north. This
infer when they were used during the year? suggests that the primary hunting season was in
the fall and that positioning of the currently ob-
For example, we placed the hunting blinds served hunting blinds was to support fall hunting.
discovered by the archaeological survey in their These are new insights that the fusion of survey
proper positions on the land bridge. We then had and gaming approaches through an agile tech-
the herd simulate southern and northern migrations nology online has produced. The agile approach
respectively. The question of interest is whether online has taken advantage of the synergy of the
the positioning of the blinds produces a better two different technologies to produce insights
kill count for northern or southern migration. The that neither one could have produced on its own.
presence of a significant difference may allow
205
The AGILE Design of Reality Game AI
206
The AGILE Design of Reality Game AI
Lewis, C. (2007).. . Journal of Paleolimnology, Reynolds, R. G., Ali, M., & Jayyousi, T. (2008).
37, 435–452. doi:10.1007/s10933-006-9049-y Mining the Social Fabric of Archaic Urban Cen-
ters with Cultural Algorithms. IEEE Computer,
Lovis, W. (1989). Michigan Cultural Resource
41(1), 64–72.
Investigations Series 1, East Lansing.
Shott, M. (1999). Cranbrook Institute of Science
Monaghan, G., & Lovis, W. (2005). Modeling
. Bulletin, 64, 71–82.
Archaeological Site Burial in Southern Michigan.
East Lansing, MI: Michigan State Univ. Press. Thomas, R., Kemp, A., & Lewis, C. (1973).. .
Canadian Journal of Earth Sciences, 10, 226–271.
Moore, T., Rea, D., Mayer, L., Lewis, C., &
Dobson, D. (1994).. . Canadian Journal of Earth Warner, G., Hebda, R., & Hahn, B. (1984). Palaeo-
Sciences, 31, 1606–1617. doi:10.1139/e94-142 geography, Palaeoclimatology, Palaeoecology,
45, 301–345. doi:10.1016/0031-0182(84)90010-5
Reynolds, R. G., & Ali, M. (2008). Computing
with the Social Fabric: The Evolution of Social Wood, W., & Kleb, W. (2002). Extreme Program-
Intelligence within a Cultural Framework. IEEE ming in a research environment . In Wells, D., &
Computational Intelligence Magazine, 3(1), Williams, L. (Eds.), XP/Agile Universe 2002 (pp.
18–30. doi:10.1109/MCI.2007.913388 89–99). doi:10.1007/3-540-45672-4_9
207
208
Chapter 12
Management of Distributed
Energy Resources Using
Intelligent Multi-Agent System
Thillainathan Logenthiran
National University of Singapore, Singapore
Dipti Srinivasan
National University of Singapore, Singapore
ABSTRACT
The technology of intelligent Multi-Agent System (MAS) has radically altered the way in which com-
plex, distributed, open systems are conceptualized. This chapter presents the application of multi-agent
technology to design and deployment of a distributed, cross platform, secure multi-agent framework to
model a restructured energy market, where multi players dynamically interact with each other to achieve
mutually satisfying outcomes. Apart from the security implementations, some of the best practices in
Artificial Intelligence (AI) techniques were employed in the agent oriented programming to deliver
customized, powerful, intelligent, distributed application software which simulates the new restructured
energy market. The AI algorithm implemented as a rule-based system yielded accurate market outcomes.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
celerating the commercialization of technologies electricity crisis (Budhraja, 2001), many critics
and solutions for smart grids all over the world. have agreed that deregulation is indeed a noble
Multi-agent system is one of the most excit- endeavour. The problem associated with deregu-
ing and fastest growing domain in agent oriented lation can be solved with structural adjustments
technology which deals with modeling of autono- to the markets and learning from past mistakes.
mous decision making entities. Multi-agent based This chapter shows the development and
modeling of a microgrid is the best choice to form implementation of multi-agent application to
an intelligent microgrid (Rahman, Pipattanasom- deregulated energy market. The developed appli-
porn, & Teklu, 2007; Hatziargyriou, Dimeas, cation software is a testament of the multi-agent
Tsikalakis, Lopes, Kariniotakis, & Oyarzabal, framework implementation and effectiveness of
2005; Dimeas & Hatziargyriou, 2007), where each dynamic modeling of multi-agent environment
necessary element in a microgrid is represented where the internal tasks of each agent are executed
by an intelligent agent that uses a combination of concurrently with external inputs from the agent
AI-based and mathematical models to decide on world. Successful deployment of the application
optimal actions. software coupled with high degree of robustness
Recent developments (Rahman, Pipattana- indicates the relevance and operational level of
somporn, & Teklu, 2007; Hatziargyriou, Dimeas, multi-agent system based application software
Tsikalakis, Lopes, Kariniotakis, & Oyarzabal, development. User can use the software for any
2005; Sueyoshi & Tadiparthi, 2007) in multi-agent size of power system by defining the number of
system have shown very encouraging results in agents in the system and inserting the associated
handling multi-player interactive systems. In information.
particular, multi-agent system approach has been The structure of the remaining chapter is as
adopted to simulate, validate and test the open follows: Section 2 provides the introduction of mi-
deregulated energy market in some recent works crogrid and Distributed Energy Resource (DER),
(Sueyoshi & Tadiparthi, 2007; Bagnall & Smith, and Section 3 gives an introduction of restructured
2005; Praça, Ramos, Vale, & Cordeiro, 2003; electricity market. Section 4 describes the imple-
Logenthiran, Srinivasan, & Wong, 2008). Each mentation of multi-agent system based application
participant in the market is modeled as an autono- software for PoolCo energy market simulation.
mous agent with independent bidding strategies Section 5 demonstrates the flow of simulation of
and responses to bidding outcomes. They are able the implemented application software. Section 6
to operate autonomously and interact pro-actively discusses results of PoolCo outcome of a sample
within their environment. Such characteristics of microgrid. Finally, it is concluded in the seventh
agents are best employed in situations where the section.
role identities are to be simulated as in a deregu-
lated energy market simulation.
The dawn of the 21st century has seen nu- BACKGROUND
merous countries de-regulating or lobbying for
deregulation of their vertically integrated power Microgrid and Distributed
industry. Electric power industry has seen an evo- Energy Resource
lution from a regulated to a competitive industry.
The whole industry of generation, transmission and Over the years, the computer industry has been
distribution has been unbundled into individual evolving continuously and the power industry
competing entities. Although the journey has been has remained relatively stable. In the past few
far from seamless as observed in the California’ years, the power industry also has seen many
209
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
revolutionary changes. The deregulated energy utilities is experiencing major changes in the
environment has favoured a gradual transition structure of its markets and regulations (Lasseter,
from centralized power generation to Distributed Akhil, Marnay, Stephens, Dagle, Guttromson,
Generation (DG) where sources are connected at Meliopoulos, Yinger, & Eto, 2002; Shahidehpour
the distribution network. Several technologies, & Alomoush, 2001). The power industry has
such as diesel engines, micro turbines, fuel cells, become competitive because the traditional
wind turbines and photovoltaic systems can be part centralized operation is replaced with an open
of a distributed generation system. The capacity of market environment. This transformation is often
the DG sources varies from few kWs to few MWs. called as the deregulation of electricity market.
Distributed systems can also bring electricity to Market structure varies from country to country
remote communities which are not connected depending on the policies adopted in the country.
with the main grid. Such multiple communities For example, the Independent System Operator
can create a microgrid of power generation and (ISO) and the Power Exchange (PX) are separate
distribution. entities in some countries’ markets like Califor-
Microgrids can be defined as low voltage nia’s market, although the PX functions within
intelligent distribution networks comprising the same organization as the ISO, while they are
various distributed generators, storage devices under the same structure with control of the ISO
and controllable loads which can be operated as in some other markets.
interconnected system with the main distribution To implement a competition, vertically in-
grid, or as islanded system if they are disconnected tegrated utilities are required to unbundle their
from the main distribution grid. The common retail services into generation, transmission and
communication structure and distributed control distribution. Generation utilities will no longer
of DG sources together with controllable loads have a monopoly. Even small business companies
and storage devices such as flywheels, energy will be free to sign contracts for buying power
capacitors and batteries, are central to the concept from cheaper sources. Many new concepts (Sha-
of microgrids (Lasseter, Akhil, Marnay, Stephens, hidehpour & Alomoush, 2001; Shahidehpour,
Dagle, Guttromson, Meliopoulos, Yinger, & Eto, Yamin, & LI, 2002) have appeared to facilitate
2002). From the grid’s point of view, a microgrid the way of dealing with restructuring. A few criti-
can be regarded as a controlled entity within the cal roles of these entities and concepts which are
power system that can be operated as a single instrumental for understanding the multi-agent
aggregated load and a small source of power or system based modeling of restructured energy
ancillary services supporting the network. From markets are discussed here.
the customers’point of view, microgrids are similar
to traditional low voltage distribution networks Independent System Operator (ISO)
which provide local thermal and electricity needs.
In addition, microgrids enhance the local reliabil- ISO is an independent entity of individuals in
ity, reduce emissions, improve the power quality market energy market such as generation, trans-
by supporting voltage, and potentially lower the mission, distribution companies and end users. The
cost of energy supply. ISO administers transmission tariffs, maintains
the system security, coordinates maintenance
Deregulated Energy Market scheduling, and has a role in coordinating long-
term planning. The main purpose of an ISO is to
Around the world, the electricity industry which ensure fair and non discriminatory access of the
has long been dominated by vertically integrated grid, transmission lines and ancillary services. ISO
210
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
manages the power flow over the transmission Generation Companies (GENCOs)
system and facilitates reliability requirements of
the power system. The ultimate role of ISO is to Generation companies are formed once the
ensure that the total generation meets the demand generation of electric power is segregated from
by taking congestion and ancillary services into the existing utilities. They take care of the operation
account. This function is carried out by controlling and maintenance of existing generating plants.
dispatch of flexible plants and giving orders to Electricity from them is either sold to short term
adjust the power supply levels or curtail loads to markets or provided directly to the entities that
ensure loads matching with the available power have contracts with them for purchase of electric-
generation in the system. ity. Besides real power, they may sell reactive
power and operating reserves. GENCOs include
Power Exchange (PX) Independent Power Producers (IPP).
The transmission system is the most essential The agreements between trading parties are made
element in electricity markets. The secure and based on the outcome of the energy market, which
efficient operation of the transmission system is does not represent the actual power flow in the
necessary for efficient operation of these markets. system. Constraints in the power system, for ex-
A TRANSCO has the role of building, owning, ample transmission losses and contract transmis-
maintaining, and operating the transmission sys- sion paths, affect the operation of the transmission
tem in a certain geographical region to provide system. Due to transmission line losses, the power
services for maintaining the overall reliability injected at any node in the system to satisfy a cer-
of the electrical system. The use of TRANSCOs tain demand at another node depends on the loss
assets comes under the control of the ISO and factors between the nodes. Transmission losses
they are regulated to provide non-discriminatory will affect the actual power injection pattern and
connections and comparable services for cost quantity to the network.
recovery. The ISO oversees the operation and Another issue that affects transmission pricing
scheduling of TRANSCOs’ facilities. is Contract Path which has been used between
transacted parties as dedicated paths where power
flows are assumed to flow through pre-defined
211
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
paths. However, physically electrons could flow ISO and PX are modeled as separate entities like
in a network over parallel paths owned by sev- in the California’s energy market (Shahidehpour
eral utilities that may not be through the contract & Alomoush, 2001; Shahidehpour, Yamin, & LI,
path. As a result, transmission owners need to be 2002) to illustrate their individual roles in the
compensated for the actual use of their facilities. energy market and the typical day-ahead PoolCo
The above are just two of the many implica- model is chosen because of its simplicity.
tions that power system constraints affect pricing The PoolCo model (Shrestha, Song, & Goel,
in a restructured market. Though it is beyond the 2000) consists of competitive independent power
scope of this discussion and also beyond the scope producers, vertically integrated distribution com-
of this application development, managing such panies load aggregators and retail marketers. The
constraints and their impacts in pricing is essential. PoolCo does not own any part of the generation
or transmission utilities. The main task of PoolCo
Market Models is to centrally dispatch and schedule generating
units in the service area within its jurisdiction.
The main objectives of an electricity market are The operating mechanism of the PoolCo model is
to ensure the secure efficient operation and to described in Figure 1. In a PoolCo market opera-
decrease the cost of electricity through competi- tion, buyers (loads) submit their bids to the pool
tion. Several market structure (Praça, Ramos, Vale, in order to buy power from the pool and sellers
& Cordeiro, 2003; Shahidehpour & Alomoush, (generators) submit their bids to the pool in order
2001; Shrestha, Song, & Goel, 2000) models exist to sell power to the pool. All the generators have
all over the world. These market models would right to sell power to the pool but they can not
differ in terms of marketplace rules and gover- specify customers.
nance structure. Generally they can be classified During PoolCo operation, each player will
into three types such as PoolCo model, Bilateral submit their bids to the pool which is provided
contract model and Hybrid model. by PX. The PX sums up these bids and matches
The PoolCo market model is a marketplace interested demand and supply of both sellers and
where power generating companies submit their buyers. The PX then performs economic dispatch
production bids, and consumer companies submit to produce a single spot price for electricity for
their consumption bids. The market operator uses the whole system. This price is called the Market
a market clearing tool to find the market clearing Clearing Price (MCP) which is the highest price
price and accepted production and consumption in the selected bids of the particular PoolCo
bids for every hour. The bilateral contracts are simulation hour. Winning generators are paid the
negotiable agreements between sellers and buy- MCP for their successful bids while successful
ers about power supply and reception. Bilateral loads are obliged to purchase electricity at MCP.
contract model is very flexible because negotiat- Generators compete for selling power. If the bids
ing parties can specify their own contract terms submitted by generator agents are too high, they
and conditions. Finally, the third market model is have low possibility to sell power. Similarly, loads
hybrid model which is a combination of PoolCo compete for buying power. If bids submitted by
and Bilateral contracts models. It has the features load agents are too low, they have low possibil-
of PoolCo as well as Bilateral contracts models. ity to get power. In such a model, generators bids
In this model, customers can either negotiate with with low cost and load bids with high cost would
a supplier directly for a power supply agreement essentially be rewarded.
or accept power from the pool at the pool market
clearing price. For this software development,
212
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
213
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
214
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
Finally, the Power System package comprises of received, PX disseminates this information back
data structures used to represent the state of the to the relevant market participants.
physical power system network. ISO Agent: ISO in this framework performs
the roles of a regulatory body. ISO seeks to ensure
Agents Package the authenticity of the bids and the stability of the
network. Bids are checked for violations and acted
The Agent package in this framework consists upon accordingly. ISO would conduct these checks
of several agents such as ISO, PX, Schedule with the help of a rule based engine customized
Coordinator, PoolCo Manager, Power System for ISO. In addition, network simulations are
Manager, Security Manager, Sellers and Buyers. carried out on the day schedules to ensure stabil-
This application software focuses only on the ity of the network with power world simulator.
restructured energy market simulation. The dif- ISO also maintains a database of day schedules.
ferent entities would interact in this framework As mentioned earlier, ISO has the broad role of
to simulate a day-ahead market. A test system seeing to the system operation and the stability
with three sellers and five buyers is considered of the network in this project.
for the case study, however it can be extended for Security Manager Agent: The security manager
any number of market participants. The Figure 5 is an overall central information security hub that
shows some of the main agents implemented in provides all encryption, decryption, encryption
this framework. keys generation, issue of digital certificates and
PX Agent: PX agent has been customized for other security related services to the agent world.
purpose of modeling the restructured energy All agents have to register with security manager
market. PX acts as a middle man between the to make use of the security services in the network.
various market participants and ISO. Market As all message transmission is done through the
participants will submit their bids to the pool. PX Secured Socket Layer (SSL) protocols, agents
performs scheduling through Schedule Coordina- which do not register with the security manager
tor (SC) agent. The schedule coordinator will will have no access to the SSL service thus they
collate all the bids and determine a schedule using will not be able to communicate with any other
the market clearing engine. agents in the network.
For any particular day schedule, PX will also Authorized agents will have valid ID in the
scans for any violation bids which are sent back agent world and these IDs are used to register
by the ISO. If any vectorized violated bids are with security manager. In this application soft-
215
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
ware, security architecture for intelligent agents • The encryption process is done by two sep-
is employed in the network as shown in Figure 6. arate entities (Message encoding agent and
Security manager has exclusive access to mes- channel encoding agent). Unlike systems
sage encoding agent and channel encoding agent with only one encrypting entity where
which are mainly responsible for providing en- all encryption are done centrally, it takes
cryption of messages and channels services to all twice the effort to search or guess a key
agents. All agents who wish to send messages same as the generation key is done by two
across the network will have to engage their ser- separate entities.
vice in encrypting the message and channel which • The channel encryption provides dual level
they are going to send through. The agents need of security. Every time a message is sent
to contact the security manager agent, upon the between two agents, a new secured chan-
authentication; message encoding agent and chan- nel is established. The encryption key used
nel encoding agent provide security services to to establish the secured channel is always
the agents. Message encoding agent will provide different. Since the channel encryption is
encryption service for the sending agent after always different, the key value for decryp-
receiving encryption order from security man- tion is also always different. This makes it
ager and channel encoding agent will encrypt the even harder for unauthorized interception
message channel between the sending and receiv- of messages to succeed.
ing agent after receiving encryption order from
security manager. When the message is success- Behaviour Package
ful encrypted, sending agent will send the en-
crypted message through the encrypted channel. In multi agent systems, agents with different na-
Such architecture provides double redundancy ture are endowed with different attributes, beliefs
in communications security for extra security. For and objectives. These behaviours are added to
any hacker to successfully decrypt the messages internal tasks of agents. Behaviour is basically
sent between two agents, it is needed to break the an event handler. Behaviours in the agents are
encryption of the channel and then the message executed in an order as arranged by the JADE
itself. This is difficult to achieve for the follow- internal behaviour scheduler. This scheduler is
ing reasons. responsible for organizing and arranging all the
216
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
behaviours of each and every agent. The Figure 7 8 shows some of the main ontologies implemented
shows some of the main behaviours implemented in this framework.
in this framework. Back to the software programming of the
implementation; when an agent sends information
Ontology Package to another agent, it will be serialized in a fashion
which is normally not understood by another agent
The concept of ontology can be imagined as unless it knows what ontology is used. Once the
vocabulary in an agent world for the communi- agent knows which class of ontology does this
cation. By defining any information as a specific information belongs to, it will be able to “decode”
ontology, it will look like our speaking language. and “reassemble” the information in a meaningful
Once the agent determines what language this way such that it can read useful information from
information is coded in, it will try to retrieve an it. Some of the ontologies implemented in this
ontology “decoder” from the ontology package. framework are given in details below.
With the help of these “decoders”, the receiving Bid: This is a class for all bids submitted by
agent though not speaking the native language will sellers and buyers to the PoolCo. These bids
be able to understand the other agent. The Figure contain the bid price, quantity and much other
217
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
218
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
Tools Package
Market Participants Bid Received Time Bid Received for Quantity Price
Buyer 1 01-04-2009 12:00 PM 01-04-2009 11:00 PM 379.0 27.7
Seller 2 31-04-2009 11:00 AM 01-04-2009 03:00 PM 420.0 28.7
Seller 1 01-04-2009 12:00 PM 02-04-2009 11:00 PM 95.0 32.0
219
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
Content Slot Coding: This is a main encoding counter proposals. Effectively, the initiating agent
and encryption engine used for scrambling mes- has to pick from the presented contracts and cannot
sages sent from one agent to other. The message negotiate the price. The advantage of contract-net
is first serialized and then encoded using the is that it distributes computing, allowing the spe-
Base64.jar which is the encoding file used for cific agent which started a contract net process to
JADE serialization and transmitting sequences be responsible for evaluating bids and deciding
of bytes within an ACL Message. Further, it is based on its own rules which contracts to accept.
encrypted using the OpenSSL package to apply It also separates internal agent information from
strong encryption on the message which will be one another, since agents only communicate
used in the RSA algorithm. through the defined contract-net protocol and all
calculations are done internally to each agent.
Coordination between Agents Since the agents can change at every contract-net
cycle, there is no dependence on a specific agent.
The coordination between agents is an important A system with more complex negotiation might
issue in the MAS. In an energy market model, the lead to lower costs for the system. However,
agents coordinate (Koritarov, 2004; Krishna, & simple contract- net is sufficient to demonstrate
Ramesh, 1998; Krishna & Ramesh, 1998a) among a distributed coordination framework.
themselves in order to satisfy the energy demand A directory service allows agents to register
of the system accomplish with the distributed themselves and publish their capabilities. By us-
control of the system. The coordination strategy ing a directory service, agents do not have to be
defines the common communication framework aware of the other agents. For example, a load
for all interactions between agents. Simple con- agent will look up sources in the directory every
tract-net coordination is chosen for the process- time it wishes to secure a new supply contract.
ing of wholesale market because of its simplest This allows for agents to be added or removed
coordination strategies. All discussions between from the system at any time since agents are
agents are started simply by a requesting agent included in contract-net negotiations once they
asking the other agents for a proposed contract register themselves with the directory service. The
to supply some commodity, and then awarding coordination layer that the approach defines is the
contracts from the returned proposals in a fashion strategic layer above the real time layer. Because
that minimizes cost or fulfils some other goal. The of the time required for a contract-net interaction
disadvantage of simple contract-net coordination to complete, and since contracts are assigned in
is only simple negotiation without allowing for discrete time intervals, this coordination layer
220
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
cannot address real time issues. The coordination non-contractual binding and SSL communications,
layer allows for the distributed agents to plan how contractual binding communications and finaliza-
resources should be applied for satisfying demand. tion and sealing of contracts. The general flow of
The actual operation of the system components the software simulation can be seen in Figure 14.
self regulates through negative feedback since The Figure 15 shows multi-agent system
the system cannot produce more energy than is launching. It is started via agent launch pad by
consumed. Figure 13 shows the overall commu- which all the administrative agents such as
nication between the agents in this simulation. PoolCo manager agent and its subordinate agents,
security manager agent and its subordinate agents,
and power system manager agent are launched.
SIMULATION OF DEVELOPED Buyer agents and seller agents are created and
SOFTWARE launched as static agents in a local machine.
After seller agents and buyer agents are cre-
The Multi-agent framework and generic service ated, they will execute their own thread to initial-
components implemented in the software are ize their generation capacities, load requirements
integrated to deploy a simulation application of and bidding price by obtaining these values from
modeling of restructured energy market. This a centrally held database in the simulation envi-
simulation framework consist of four main states ronment. When all the parameters of the agents
namely, agent world creation and initialization, are properly initialized, each agent will autono-
221
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
mously register itself with the DF as their first execution. After retrieving the necessary directory
task. The process is illustrated in Figure 16. listing of various agents in the network, each agent
As soon as the agents registers themselves with will contact the security manager for allocation
the DF, the agents will query the DF for a complete of ID keys to be used for encryption purpose and
listing of agents and their services in the network SSL algorithm engine as shown in Figure 17.
using a certain search constraints. These search As soon as the agents registered for security
constraints are usually queries to the DF for agents services, all further communication on the network
with a certain types of services or agents with will be encrypted. When PoolCo is ready to com-
certain types of names. Sellers will send a query municate with all player agents, it will broadcast
to the DF on all other buyer agents and PoolCo an OPEN message as shown in Figure 13. All the
manager agent. Buyers will also end a query to player agents who wish to take part in this round
the DF on all other seller agents and PoolCo of bidding, will respond by sending a SUBSCRIBE
manager agent. The DF will respond with listing message to subscribe PoolCo manager agent.
of all agents that match their search constraints, PoolCo will close the subscription window after
and all the physical addresses of these agents. everyone in the network has subscribed or when
With this information, agents will be able to the subscription date is expiry, whichever is ear-
autonomously contact them at their own thread of lier.
222
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
Once everyone in the network has subscribed and SC agent are the entities used to model the
PoolCo manager agent, the PoolCo manager agent internal division of PoolCo management. PoolCo
issues an AGREE message to agree all agents manager agent is the front-door communication
who have signed up for the subscription service. entity for representing the bidding system.
This reply is to confirm their subscription. When CFP message will arrive at the player agents
the AGREE message arrives, player agents will who have previously subscribed to the PoolCo
stop their own internal execution of whatever task service. Upon receiving this message, agents will
they are involving with (e.g. receiving updates stop their execution as before and handle this
from the DF for listing of agents in the network, newly arrived message. The player agents prepare
manipulating input data from DF) to handle this themselves for bidding if they are interested to
newly arrived message and they will record this participate in this round of bidding. In this stage,
correspondence in their internal message history player agents will submit the formal bids to the
database. All message exchanges are recorded by PoolCo manager agent. PoolCo manager agent will
each and every agent in their own internal message process these bids and send the results to them.
history database. After that, they will resume their These submissions of bids and replies from PoolCo
operation at whatever they were doing before. manager agent are legally binding contracts. Buy-
At the same time, they will continue to listen for ers who submitted bids to buy are legally obligated
new messages. to buy the quantity of power at bided price. The
After PoolCo manager agent sends out AGREE same things are applied for sellers too. Agents,
message to every agent who sent subscription who are interested for submitting bids, will have
message, PoolCo manager agent will also update access to their internal bidding records. They will
its own internal message history database and pro- prepare the necessary parameters like price and
ceed to prepare for a call for proposal broadcast. quantity of electricity to buy or offer in the market
Once it prepared Call For Proposal (CFP), it will and the prepared parameters will be encoded as
retrieve the list of subscribed agents in the network a bid object. When encoding is completed, they
and send a CFP message to all the subscribers. will send a PROPOSAL message to PoolCo man-
After PoolCo manager agent sent CFP message, it ager agent with the bid object enclosed. PoolCo
will also send a message to ISO agent, PX agent manager agent receiving up on the PROPOSAL
and SC agent to initialize them and prepare for message will re-directed these messages to PX
eminent auction and scheduling task. PX agent agent for recording. PoolCo manager agent will
223
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
only close the proposal window after everyone PROPOSAL message to unsuccessful bidders. All
in the network has submitted their proposals or bidders will be notified of their bidding outcomes
proposal window expiry date is due, whichever is at the end of every bidding round.
earlier. The proposal expiry date is by default one Agents, who receive an ACCEPT PROPOSAL
day after PoolCo manager agent sent out its CFP message, will record their successful bid object
message. After the proposal window is closed, PX and update their internal records. Then they send
agent will process the bids collected by a series of an OK message to the PoolCo manager agent to
sorting and tagging. These whole set of data will acknowledge the contract. Agents, who receive a
be hashed into a hashtable. Then this hashtable REJECT PROPOSAL message, will record their
will be sent to SC agent. At the same time, SC unsuccessful attempt and make changes to their
agent will send a message to the PoolCo manager internal records.
agent to notify scheduling in progress. This whole process is a round of bidding in the
SC agent has an algorithm which computes PoolCo model for one slot. In case of day-ahead
a data structure that represents the aggregated market, it is for one hour of the 24 hour slots. Agents
demand and the aggregated supply with respect usually submit a complete scheduling of 24 bids,
to the price component using a rule based system. representing their bids for the day-ahead market.
These sets of data will be processed to produce a
single spot price at market equilibrium where the
demand meets the supply. This price is called as the RELIABILITY ChECKING
Market Clearing Price (MCP). It will also calculate OF MICROGRID
the quantity of electricity transacted at this price.
PX agent will also determine the successful buyer Once the market is simulated, before the sched-
agents and seller agents in this round of bidding uling is proposed, the stability and reliability of
based on the MCP and quantity of electricity the power network is checked using power world
transacted. Then the whole set of data will be sent simulator in order to ensure that the scheduling
to ISO agent to check for violation of bidding as does not undermine the stability and reliability of
well as to check for congestion of scheduling. the system. Power world simulator is a commer-
If any bidding is violated or the scheduling is cial power system simulating package based on
congested, ISO will do the necessary actions. On comprehensive, robust power flow solution engine
the other hand, the whole set of data comprising which is capable of efficiently solving a system up
of MCP, quantity of electricity transacted, list of to 100000-bus problems. It also allows the user to
successful buyer agents and seller agents, list of visualize the system through the use of animated
unsuccessful buyer agents and seller agents will diagrams providing good graphical information
be sent to PoolCo manager agent. about the technical and economic aspects of the
After receiving this data, PoolCo manager network. A snapshot of power world simulator is
agent extracts out the relevant information and shown in Figure 18. It has several optional add-
sends to power system manager agent so that ons. OPF and SimAuto add-ons are integrated in
power system manager can update the power this software development.
system state. PoolCo will also extract the list of The optimal power flow (OPF) provides the
successful bidders from the set of data and sends ability to optimally dispatch the generation in an
a ACCEPT PROPOSAL message to successful area or group of areas while simultaneously en-
bidders embedded with details of the successful forcing the transmission line and interface limits.
bids. PoolCo will also extract the list of unsuc- The advantages of this OPF over other commer-
cessful bidders from the data and sends a REJECT cially available optimal power flow packages are
224
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
its ability to display the OPF results on system a text file, or a power world AUX file for added
one-line diagrams and contour the results for ease functionality. SimAuto is an automated server
of interpretation, and the ease with which the that enables user to access the functionalities from
users can export the OPF results to a spreadsheet, a program written externally by Microsoft Com-
225
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
ponent Object Model (COM) technology. Even sented in this chapter. The developed multi-agent
though Java does not have COM compatibility, application software simulates the restructured
Java integrates the Java Native Interface (JNI) energy markets with accurate results. Further, this
which is a standard programming interface be- is a testament of the multi-agent framework imple-
tween Java programming and COM objects. JNI mentation and effectiveness of dynamic modeling
allows Java virtual machine to share a process of multi-agent environment where internal tasks
space with platform native code. of each agent are executed concurrently with
If the schedule generated for the microgrid external inputs from the agent world. Successful
results in congestion, ISO would employ the deployment of the application software coupled
power world simulator to mitigate congestion. with high robustness indicates the relevance and
The purpose of the OPF is to minimize the cost operational level of multi-agent system based ap-
function by changing system controls and tak- plication software development. User can use the
ing into account both equality and inequality software for any size of power system by defining
constraints which are used to model the power the number of agents in the system and inserting
balance constraints and various operating limits. the associated information.
It functionally combines the power flow with The Figure 19 shows a demonstration of devel-
economic dispatch. In power world simulator, oped software simulation. The Remote Monitoring
the optimal solution is being determined using Agent (RMA) console can be run in the JADE
linear programming. Once congestion has been runtime environment where developed agents in
mitigated, the new network schedule and relevant the frame work can be monitored and controlled.
network information will be extracted from the The other graphical tools such as dummy agent,
power world simulator. sniffer agent and introspector agent, which are
used to monitor, debug and control the MAS
programming, can be activated from RMA. In
RESULTS AND DISCUSSION the figure, the sniffer agent is activated and the
successful implementation of agents’ communica-
Development and implementation of multi-agent tion is observed.
application to restructured energy markets is pre-
226
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
Several different scenarios of double sided where supply and demand are matched at the MCP
bidding PoolCo market are simulated and the as illustrated in Figure 22.
scenarios are defined as follows: scenario 1 is Table 2 shows the above scenarios numeri-
defined for a case where excess demand is avail- cally. In scenario 1, at the market equilibrium, the
able at the MCP which is illustrated in Figure 20; bidding quantity of Load 1 is 10kW whereas the
scenario 2 is defined for a case where excess sup- successful market output is only 5kW. Therefore
ply is available at the MCP which is illustrated in additional 5kW power is necessary for Load 1.
Figure 21; and scenario 3 is defined for a case Here, the excess demand of 5kW is available at
227
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
the market equilibrium. In scenario 2, at the mar- with agent oriented programming methodology to
ket equilibrium, the bidding quantity of Pgen 1 deliver customized and powerful application soft-
is 70kW whereas the successful market output is ware. The simulation of this software demonstrates
only 65kW. Therefore 5kW power is available at the successful development and implementation
Pgen 1. Here, the excess supply 5kW is available of the multi-agent framework, the feasibility and
at the market equilibrium. In scenario 3, at the effectiveness of a multi-agent platform to model
market equilibrium, the bidding quantity of Load the restructured energy markets, and the roles
2 is 20kW and the successful market output is of ISO and PX in particular for carrying out the
also 20kW. Here, the supply and the demand are market operations.
exactly matched at the market equilibrium. This application is a fully cross platform, FIPA
The agent platform (JADE) used in the software compliant software written in Java language. The
development is a FIPA compliant platform. In the application is made by various Java packages,
implementation of agent oriented designing, it giving future programmers to work with both
strictly follows the FIPA standards compliance to readymade pieces of functionality and abstract
ensure interoperability with future systems of interfaces of custom and application tasks. Fur-
FIPA standards as well. JADE is fully Java coded ther, the attractive features of Java, in particular
platform. It shows the complete cross platform its cross platform deployment, security policies
portability on all the systems when tested on and provisions for distributed computing through
UNIX, Linux, Windows 95, 98, 2000, XP and Remote Method Invocation (RMI) and sockets,
Vista machines. have benefited this software development.
CONCLUSION ACKNOWLEDGMENT
This chapter presents a multi-agent software The funding for this project was received from
development to simulate the ISO/PX operations SERC IEDS programme grant R-263-000-507-
for a restructured energy markets. This is done 306.
228
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
229
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
system which provide competition and open ac- and controllable loads and their combination is
cess to all user in the interconnection. referred to as Distributed energy resource.
PoolCo Market Model: One of the market Coordination Between Agents: In multi-
models in restructured power system. PoolCo is agent systems, an agent usually plays a role with
a centralized marketplace that clears the market cooperative or completive behaviours with other
for buyers and sellers according to the bids of agents. Therefore communication between agents
sellers and buyers. is necessary in a multi-agent system.
Rule-Based System: One of the ways to store Reliability of Power System: Concerns suf-
and manipulate knowledge to interpret informa- ficient generation and transmission resources are
tion in a useful way. available to meet projected demand and status of
Distributed Energy Resource: Distributed system after outages or equipment failures. Reli-
generation technology with distributed storage able power system operation must satisfy voltage
constraints and power flows within thermal limits.
230
Section 6
Multi-Agent Learning
232
Chapter 13
Effects of Shaping a
Reward on Multiagent
Reinforcement Learning
Sachiyo Arai
Chiba University, Japan
ABSTRACT
The multiagent reinforcement learning approach is now widely applied to cause agents to behave ra-
tionally in a multiagent system. However, due to the complex interactions in a multiagent domain, it is
difficult to decide the each agent’s fair share of the reward for contributing to the goal achievement.
This chapter reviews a reward shaping problem that defines when and what amount of reward should
be given to agents. We employ keepaway soccer as a typical multiagent continuing task that requires
skilled collaboration between the agents. Shaping the reward structure for this domain is difficult for the
following reasons: i) a continuing task such as keepaway soccer has no explicit goal, and so it is hard
to determine when a reward should be given to the agents, ii) in such a multiagent cooperative task, it
is difficult to fairly share the reward for each agent’s contribution. Through experiments, we found that
reward shaping has a major effect on an agent’s behavior.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Effects of Shaping a Reward on Multiagent Reinforcement Learning
requires a different time period, it is appropriate The rest of this chapter is organized as follows.
to model this problem as a semi-Markov decision In the next section, we describe the keepaway
process. soccer domain, and discuss its features from the
To our knowledge, designing the reward func- viewpoint of reinforcement learning. In Sec-
tion has been left out of reinforcement learning tion 3, we introduce the reinforcement learning
research, even though the reward function intro- algorithm we applied and our reward design for
duced by Stone (2005) is commonly used. How- keepaway. Section 4 shows our experimental
ever, designing the reward function is an important results, including the acquired behavior of the
problem (Ng, 2000). As an example, the following agents. In Section 5, we discuss the applicability
are difficulties of a designing reward measure for of our reward design on reinforcement learning
keepaway. First, it is a continuing task that has no tasks. We state our conclusion and future work
explicit goal to achieve. Second, it is a multiagent in Section 6.
cooperative task, in which there exists a reward
assignment problem to elicit desirable teamwork.
Because of these two features of keepaway, it is PROBLEM DOMAIN
hard to define the reward signal of each keeper
to increase the time of ball possession by a team. Keepaway Soccer
It should be noted that the reward for increasing
each keeper does not always lead to increased Keepaway (Stone, 2002) is known as a subtask
possession time by a team. of RoboCup soccer, and it provides a great basis
In the case of a continuing task, we can ex- for discussion on important issues of multiagent
amine a single-agent continuing task such as the systems. Keepaway consists of keepers who try
pole balancing task, in which one episode consists to keep possession of the ball, and takers who at-
of a period from the starting state to the failure tempt to take possession of the ball within a limited
state. If the task becomes a failure, a penalty is region. The episode terminates whenever takers
given, and this process can be used to evaluate take possession or the ball runs out of the region,
teamwork and individual skills. In contrast, in the and then players are reset for a new episode. When
case of a multiagent task, which includes both a takers keep the ball for more than four cycles of
teammate and at least one opponent, it is hard to simulation time, they are judged to have gained
tell who contributes to the task. In a multiagent ball possession successfully.
task such as keepaway, it is not always suitable Figure 1 shows the case of three keepers and
to assign positive rewards to agents according to two takers (3 vs. 2) playing in a region of size
the amount of time cycles of each agent. Appro- 20×20[m]. Here, keeper K1 currently has the ball,
priately assigning an individual reward for each K2 is the closest to K1, and K3 is the next closest,
agent will have a greater effect on cooperation and so on, up to Kn when n keepers exist in the
than sharing a common reward within the team. region. In a similar way, T1 is the closest taker to
But, if the individual reward is not appropriate, K, T2 is the next closest one, and so on, up to Tm,
the resulting performance will be worse than that when m takers exist in the region.
after sharing a common reward. Therefore, assign-
ing an individual reward to each agent can be a Macro-Actions
double-edged sword. Consequently, our focus is
on assigning a reward measure that does not have In the RoboCup soccer simulation, each player
a harmful effect on multiagent learning. executes a primitive action, such as a turn (angle),
dash (power) or kick (power, angle) every 100[ms].
233
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Figure 1. 3 vs. 2 keepaway task in 20 [m] × 20 [m]. (a) Object names; (b) Initial positions
234
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Issues of Reward Shaping of a period from the starting state to the failure
state. In such a continuing task, an episode will
Figure 3 shows the task classification of testbeds always end with failure, and the penalty can help
for multiagent and reinforcement learning from the the design of a good reward measure to improve
viewpoints of designing a reward problem. As we both teamwork and individual skills. In contrast,
mentioned in the previous section, the difficulties from the aspect of a multiagent task that includes
of designing a reward are the lack of an explicit both teammates and opponents, as in keepaway, it
goal and the number of agents involved in a task. is hard to tell who contributed to keeping posses-
First, in a continuing task there is no explicit sion of the ball within the team. In other words,
goal to achieve, so the designer cannot tell when we should consider the case where some keepers
the reward should be given to the agent/s. Second, contribute and others may not, or an opponent
in a multiagent cooperative task, there exists a (taker) contributes by taking a ball. What has
reward assignment problem of deciding the to be noted is that the episode of a multiagent’s
amount of reward be allotted to each agent for continuing task end with someone’s failure. This
achieving desirable teamwork. The keepaway task problem has been discussed as a credit assignment
contrasts with a pursuit game, which is a tradi- in time-extended single-agent task and multiagent
tional multiagent research testbed that has an task domains (Agogino, 2004).
explicit common goal. Because the pursuit game Though the credit assignment issue is closely
is an episodic task, the reward just has to be related to our research here, we design a reward
given when hunters achieve their goal. In addition, function to evaluate the “last” state-action pair of
it is easier to assign a hunter’s (learner’s) reward each agent in the SMDP (semi Markov Decision
than the reward of a keeper (learner), because all Process) domain, where each agent’s action takes a
four hunters definitely contribute to capture the different length of time, instead of assigning each
prey. Therefore, keepaway is classified as a state-action pair of each agent’s whole state-action
harder task in terms of designing a reward because sequence. Here, we consider the reward design
we have no clues to define an explicit goal be- problem that consists of setting the amount of
forehand and to assign a reward to each agent. reward value and time of reward assignment so
From the aspect of a continuing task, we can that we can optimize design issues of the reward
refer to the case of single-agent continuing tasks, measure in the multiagent learning process.
e.g., pole balancing, in which one episode consists
Figure 2. Takers’ policy: 3 vs. 2 keepaway task in a 20 [m] × 20 [m] region. (i) Always GoToBall; (ii)
GOTOBall and BlockPass
235
Effects of Shaping a Reward on Multiagent Reinforcement Learning
236
Effects of Shaping a Reward on Multiagent Reinforcement Learning
In Figure 4, from lines 4 to 8, feature vector the simulation time when the keeper holds the ball
Ft and Ft +1 are made by tile-coding. In our ex- or the episode ends, and LastActionTime is the
periments, we use primarily single-dimensional simulation time when the keeper selects the last
tilings: 32 tilings are overlaid with a 1/32 offset, action. We hereafter refer to function Equation (4)
and each tiling is divided into ntile segments. Each as rs. In this approach, a reward is defined as the
segment is called a tile and each state variable amount of time between the last action and the
lies in a certain tile, which is called an active tile. current time (or end time). That is, as the amount
In keepaway, the distances between players and of time increases after taking the ball, the amount
angles are represented as state variables. In the of reward given to the keeper also increases.
case of 3 vs. 2, the total number of state variables
is 13, which consists of 11 for distance variables rs = CurrentTime - LastActionTime (4)
and 2 for angle variables. In our experiments, ntile
= 10 for each of the 11 distance variables, and ntile This approach seems reasonable and proper.
=18 for each of the 2 angle variables. However, there are some problematic cases by
Accordingly, the number of total tiles Ntile is using rs, as shown in Figure 5, for example. In
4672. Each value of the state variable i is repre- Figure 5(b), K1, who is the current ball holder,
SN F(i ) = 1 gets a larger reward than the one in Figure 5(a).
sented as a feature vector, F(i ) , i =tile
and. Each value of i is shown as follows: Consequently, the keeper passes the ball to the
intercepting takers on purpose, because the ball
ntile will bounce back directly to K1, and then K1 is paid
(i th tile is active) some reward for selecting this action. This action
i = N tile (3)
seems to yield a larger reward than other actions.
0 (otherwise) K1 in Figure 5(d) gets a larger reward than the
one in Figure 5(c) when the reward is defined by
rs. Consequently, the keeper is likely to pass to
Previous Work in Reward Design the teammate (keeper) who is in the farthest posi-
tion to get a larger reward than pass to the nearer
For the reward r appearing in line 7 of Figure 2, one. Although it seems reasonable, we cannot tell
rs defined by Equation (4) (Stone, 2005) is com- which one is the better strategy because it depends
monly used for keepaway. Here, CurrentTime is on the amount of noise and the keepers’ skill.
237
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Figure 5. Problematic situations. (a) K1 takes HoldBall(); (b) K1 takes PassBall(); (c) K1->K2->K1; (d)
K1->K3->K1
These examples show the difficulty of design- programmed reward for each step, as shown in
ing a reward function for a continuing task, as Figure 6, it is usually difficult to say whether these
previously mentioned. values become an appropriate indicator to keep
a successful situation. Therefore, we introduce a
Reward Design for Collective novel reward function based on a constant reward
Responsibility sequence (Figure 6) to reduce the harmful effects
of the reward design on emerging behavior.
In the well-known continuing task, pole balancing,
the amount of reward is defined by Equation (5). -1 (under a failure condition)
The agent receives either 0 or -1 for a successful r = (5)
0 (otherwise)
and failure condition, respectively. We follow the
scheme that the reward makes an agent do what
it takes to achieve, not how it achieves (Sutton, The major difference between the domain of
1998). From this standpoint, the reward value pole balancing and keepaway is the number of
has to be constant during a successful situation agents involved. Unlike the single-agent case, in
such as pole balancing, because we do not know which one agent is responsible for the failure or
which action achieves the task. While rs by Equa- success of the task, responsibility is diffused in
tion (4) (Stone, 2005) provides a differentially the multiagent case. However, in the keepaway
238
Effects of Shaping a Reward on Multiagent Reinforcement Learning
239
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Figure 8. Learning curve under the various reward functions (moving average of 100 episodes) against
takers’ policy (i)
Figure 9. Learning curves with the best-performance function and the existing function (moving average
of 1000 episodes) against takers’ policy (i)
2 shows the pass frequency after learning. Here, and K3 are always free from these takers. Thus,
we find that the frequency of passing a ball was to examine the availability of our reward func-
smaller for rf than for rs. tion, rf, in the different situation where the takers
have a more complicated policy, we apply our
Effect of Takers’ Policy reward function to the situation where takers act
with policy-(ii). Figure 13 shows the comparison
Here, we discuss the emerged behavior from the between the two learning curves with our keepers
viewpoint of the takers’ policies introduced in using rf and rs.
Figure 2(i) and (ii). It seems that the emerged When the takers act with policy-(ii), the per-
behavior shown in Figure 11(a) is especially ef- formance of keepers with rs is worse than for
fective against the takers with policy-(i). Because takers with policy-(i). As mentioned in Section
both takers always select GoToBall(), keeper K2 3.2, reward function rf has adverse effects on
Table 1. Pass timing: distance to the nearest taker from the keeper with the ball after 25 hours
Distance [m]
Taker’s policy-(i) Taker’s policy-(ii)
rs 5.60 3.50
rf 7.44 7.38
Frequency [times/second]
Reward function Takers’ policy-(i) Takers’ policy-(ii)
241
Effects of Shaping a Reward on Multiagent Reinforcement Learning
learning. Also, when the takers act with policy- hours of training against takers acting with poli-
(ii), rf makes keepers possess the ball longer than cy-(ii). We found that keepers reinforced by rf
function rs does. Figure 14 shows the results of could possess the ball at least twice as long as
the episode duration after keepers experienced 25 keepers reinforced by rs. However, episode dura-
242
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Figure 13. Learning curves with the best reward function and the existing function (moving average of
1000 episodes):against takers’ policy (ii)
243
Effects of Shaping a Reward on Multiagent Reinforcement Learning
of reward that is given to the agent is defined by keeping time. Since it is not always good for a
Equation (5). The agent receives -1 or 0 for the team when one agent keeps a task longer locally,
failure or success condition, respectively. we do not introduce the predefined value of each
The difference between pole balancing and agent’s reward individually. In our design, each
keepaway is the number of agents involved. Un- agent receives the reward f(tj) according to its
like the single-agent case, in which one agent is amount of time from taking the ball (LastAction-
responsible for the failure or success of the task, time) to the end of the task (TaskEndTime) at the
responsibility is diffused in the multiagent case. end of each episode. For the introduced reward
However, in the keepaway task, specifying the functions, rf =-1/ t j provides relatively better
agent causing a failure seems much easier than performance than that of the other functions.
specifying the agent contributing to the success. j
Though function f (t j ) = 0.7t has a similar curve
Therefore, we design reward functions for the to rf when t j < 20, as shown in Figure 7, it doesn’t
keepaway task so that a keeper who terminates perform as well as the case of rf. The main reason
the episode receives a larger penalty (i.e., smaller for this result is due to the value range of T. The
reward). Table 3 shows the comparison among range of T is always larger than 20, and so the
three cases of keepaway (hand-coded, and two similarity in T< 20 does not have much effect on
reward functions). Though the empirical results the performance.
show that keepers using our function can possess Second, the keeper with rf passes the ball more
the ball for approximately three times longer frequently than the keepers with rs in the earlier
than those hand-coded and learned by the other stage of learning, as shown Figure 12. Because
reward function (Stone, 2005), the reason for the keeper with rs can receive some reward when
this high performance has not been qualitatively selecting HoldBall in the early stage, the keeper
analyzed yet. tends not to pass the ball so many times to the
First, we discuss the reward functions that we other keeper. Meanwhile, our keepers reinforced
introduced. We introduce T=TaskEndTime-Last- by rf do not receive any penalty when they pass
ActionTime and give -1/T to the agent when the the ball; that is, they receive a penalty only when
task fails. Otherwise, the agent receives 0 con- they are intercepted or miss the pass. So, our
stantly during a successful situation, as mentioned keepers are not afraid to pass to other keepers.
in Section 3.4. The reason for sharing the same In the middle and late learning stages, the keep-
reward (= 0) among agents in the successful situ- ers with rs pass the ball frequently because they
ation is that we cannot identify which agent experience a larger reward using PassBall(k) than
contributes to the task solely by the length of
Table 3. Comparison of average possession times (in simulator seconds) for hand-coded and learned
policies against two types of takers in region 20 [m] Ã × 20 [m].
244
Effects of Shaping a Reward on Multiagent Reinforcement Learning
using HoldBall. However, the pass frequency of vious studies of keepaway (Stone 2006), we are
our keepers decreases because they experience not yet able to provide a theoretical analysis of
having their ball intercepted or missing the pass the results. At present, we have been examining
after considerable training. the problem peculiar to a continuing task that
Third, as we described in Section 2.2, the vi- terminates at failure, such as a pole balancing, and
sual information contains some noise. The passed a reward assignment within a multiagent task in
ball often fails to reach the intended destination which simultaneous learning takes place. For the
because of the noise, and so the noise has a large continuing task case, we show that a certain penalty
effect on the emerging behavior. Since the action causes an agent to learn successfully. Whereas, for
of passing carries some probability of missing or the multiagent case, we avoid the harmful effect
being intercepted, the frequency of the pass of of the agents’ simultaneous learning by parameter
our keepers learned with rf becomes small. This tuning. It is necessary to analyze the breakdown
is considered reasonable and proper behavior in a of the reward, such as which agent gets a greater
noisy environment and against takers’ policy-(i), penalty, and which gets a lesser penalty, when
shown in Figure 2(i). designing a multiagent system.
Fourth, we look at the effects of the takers’
policy. We found in Figure 13 that our keepers
with rf against takers’ policy-(ii) (Figure 2(ii)) REFERENCES
possess the ball less than in the case against tak-
ers’ policy-(i) (Figure 2(i)). It seems that, as the Agogino, A. K., & Tumer, K. (2004). Unifying
frequency of the pass increases, the duration of Temporal and Structural Credit Assignment Prob-
the episode decreases. We found in Table 2 that lems. In Proceedings of the Third International
the frequency of the pass becomes smaller when Joint Conference on Autonomous Agents and
our keepers learn about takers’ policy-(ii) in Multi-Agent Systems (pp. 980-987).
comparison with takers’ policy-(i). Arai, S. & Tanaka, N. (2006). Experimental
Last, we discuss the macro-actions we currently Analysis of Reward Design for Continuing Task
use. As the pass frequency increases, keepers in Multiagent Domains. Journal of Japanese
do not have enough time to move to a position Society for Artificial Intelligence, in Japanese,
that is free from the opponent and cannot clear a 13(5), 537-546.
path to let the ball pass from its current position.
Consequently, the probability of missing a pass Kuhlmann, G., & Stone, P. (2003). Progress in
seems to increase. This problem might be resolved learning 3 vs. 2 keepaway. In Proceedings of the
by introducing more sophisticated macro-actions RoboCup-2003 Symposium.
such as GetOpen(), and so forth.
Ng, A. Y. Ng & Russell, S. (2000). Algorithms for
Inverse Reinforcement Learning. In Proceedings
of 17th International Conference on Machine
CONCLUSION Learning (pp. 663-670). Morgan Kaufmann, San
Francisco, CA.
In this chapter, we discuss the issue of the revised
version of tile-coding as state representation and Singh, S. P., & Sutton, R. S. (1996). Reinforcement
reward design for multiagent continuing tasks, Learning with Replacing Eligibility Traces. Ma-
and introduce an effective reward function for chine Learning, 22(1-3), 123–158. doi:10.1007/
the keepaway domain. Though our experimental BF00114726
results show better performance than that of pre-
245
Effects of Shaping a Reward on Multiagent Reinforcement Learning
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. Grzes, M., & Kudenko, D. (2008). Multigrid
(2006). Keepaway Soccer: From Machine Learn- Reinforcement Learning with Reward Shaping.
ing Testbed to Benchmark . In Noda, I., Jacoff, ( . LNCS, 5163, 357–366.
A., Bredenfeld, A., & Takahashi, Y. (Eds.), Robo-
Konidaris, G., & Barto, A. (2006). Autonomous
Cup-2005: Robot Soccer World Cup IX. Berlin:
shaping: Knowledge transfer in reinforcement
Springer Verlag. doi:10.1007/11780519_9
learning. Proceedings of the 23rd international
Stone, P., & Sutton, R. S. (2002). Keepaway conference on Machine learning (pp. 489-496).
Soccer: a machine learning testbed . In Birk,
Marthi, B. (2007). Automatic shaping and decom-
A., Coradeschi, S., & Tadokoro, S. (Eds.), Ro-
position of reward functions. In Proceedings of
boCup-2001: Robot Soccer World Cup V (pp.
the 24th International Conference on Machine
214–223). doi:10.1007/3-540-45603-1_22
Learning (pp. 601–608).
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005).
Mataric, M. J. (1994). Reward functions for ac-
Reinforcement Learning for RoboCup Soccer
celerated learning. In Proceedings of the 11th
Keepaway. Adaptive Behavior, 13(3), 165–188.
International Conference on Machine Learning
doi:10.1177/105971230501300301
(pp. 181-189).
Sutton, R., & Barto, A. G. (1998). Reinforcement
Ng, A., Harada, D., & Russell, S. (1999). Policy
Learning: An Introduction. Cambridge, MA:
invariance under reward transformations: Theory
MIT Press.
and application to reward shaping. In Proceedings
of the 16th International Conference on Machine
Learning (pp. 278-287).
ADDITIONAL READING
Taniguchi, T. & Sawaragi, T. (2006). Construction
Agogino, A. K., & Tumer, K. (2004). Efficient of Behavioral Concepts through Social Interac-
evaluation functions for multi-rover systems. In tions based on Reward Design: Schema-Based
Proceedings of the Genetic and Evolutionary Com- Incremental Reinforcement Learning. Journal of
putation Conference (GECCO-2004) (pp. 1-12). Japan Society for Fuzzy Theory and Intelligent
Informatics, 18(4), (in Japanese), 629-640.
Agogino, A. K., & Tumer, K. (2005). Multi-agent
reward analysis for learning in noisy domains. Tumer, K., & Agogino, A. K. (2006). Efficient
In Proceedings of the fourth international joint Reward Functions for Adaptive Multi-Rover
conference on Autonomous agents and multiagent Systems Learning and Adaptation in Multi Agent
systems (pp. 81-88). Systems (pp. 177–191). LNAI.
Agogino, A. K., & Tumer, K. (2008). Efficient Wolpert, D. H., & Tumer, K. (2001). Optimal
Evaluation Functions for Evolving Coordina- payoff functions for members of collectives.
tion. Evolutionary Computation, 16(2), 257–288. Advances in Complex Systems, 4(2/3), 265–279.
doi:10.1162/evco.2008.16.2.257 doi:10.1142/S0219525901000188
Erez, T., & Smart, W. D. (2008). What does shaping Wolpert, D. H., Tumer, K., & Bandari, E. (2004).
mean for computational reinforcement learning? Improving search algorithms by using intelligent
In Proceedings of 7th IEEE International Confer- coordinates. Physical Review E: Statistical,
ence on Developing and Learning (pp. 215-219). Nonlinear, and Soft Matter Physics, 69, 017701.
doi:10.1103/PhysRevE.69.017701
246
Effects of Shaping a Reward on Multiagent Reinforcement Learning
KEY TERMS AND DEFINITIONS take possession of the ball within a limited region.
The episode terminates whenever takers take pos-
Reward Shaping: A technique to make re- session or the ball runs out of the region, and then
inforcement learning agent converge to the suc- players are reset for a new episode.
cessful policy for rational behavior.
Continuing Task: Has no explicit goal to
achieve, but task requires agent to keep the desir- ENDNOTE
able state(s) as long as possible.
RoboCup Soccer: See http://www.robocup. 1
Learning To Play Keepaway: http://www.
org/ cs.utexas.edu/users/AustinVilla/sim/keep-
Keepaway: Consists of keepers who try to keep away/
possession of the ball, and takers who attempt to
247
248
Chapter 14
Swarm Intelligence Based
Reputation Model for Open
Multi Agent Systems
Saba Mahmood
School of Electrical Engineering and Computer Science (NUST-SEECS), Pakistan
Azzam ul Asar
Department of Electrical and Electronics Eng NWFP University of Engineering and Technology,
Pakistan
Hiroki Suguri
Miyagi University, Japan
ABSTRACT
In open multiagent systems, individual components act in an autonomous and uncertain manner, thus
making it difficult for the participating agents to interact with one another in a reliable environment.
Trust models have been devised that can create level of certainty for the interacting agents. However,
trust requires reputation information that basically incorporates an agent’s former behaviour. There
are two aspects of a reputation model i.e. reputation creation and its distribution. Dissemination of this
reputation information in highly dynamic environment is an issue and needs attention for a better ap-
proach. We have proposed a swarm intelligence based mechanism whose self-organizing behaviour not
only provides an efficient way of reputation distribution but also involves various sources of information
to compute the reputation value of the participating agents. We have evaluated our system with the help
of a simulation showing utility gain of agents utilizing swarm based reputation system. We have utilized
an ant net simulator to compute results for the reputation model. The ant simulator is written in c# and
utilizes dot net charting capabilities to graphically represent the results.
DOI: 10.4018/978-1-60566-898-7.ch014
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
249
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Thus we can say that the two approaches to trust to the feedback provided by the other par-
come under reputation creation and distribution. ties, which reflect their trustworthiness in
Trust has evolved as the most recent area the latest transaction.
in the domain of Information systems. A wide 4. Users with very high reputation values ex-
variety of trust and reputation models have been perience much smaller rating changes after
developed in the past few years. Basically we have each update.
divided the models in to two areas, centralized 5. Ratings must be discounted over time so that
and decentralized. the most recent ratings have more weight in
the evaluation of a user’s reputation.
Centralized Reputation Mechanism
SPORAS provide a more sophisticated model
Online electronic communities manage reputa- as compared to previous model. However, it is
tion of all the users in a centralized manner, for designed for a centralized system that is unable
example eBay and SPORAS. to address the issues related with open MAS.
It is implemented as a centralized system where In decentralized systems, each agent can carry out
users can rate the interactions of other agents in trust evaluation itself without a central authority.
the past and also leave some textual comments Following section gives details of some of the
about their behaviour. For example in eBay, an decentralized models.
interaction a user can rate its partner on the scale
–1, 0 or +1 that means positive neutral or nega- jurca and Faltings
tive ratings respectively. These ratings are stored
centrally and the reputation value is computed as Jurca and Faltings introduce a reputation system
the sum of those ratings over six months(Huynh, (Jurca & Faltings 2003) where agents are incen-
Jennings and Shadbolt 2006). tivised to report truthfully about their interactions
results. They define a set of broker agents called
SPORAS R agents whose tasks are buying and aggregating
reports from other agents and selling back reputa-
SPORAS (Maes & Zacharia, 2000) extends the tion information to them when they need it. All
online reputation systems by introducing a new reports about an agent are simply aggregated using
method for rating aggregation. Specifically instead the averaging method to produce the reputation
of storing all the ratings, each time a rating is value for that agent. Though the agents are dis-
received it updates the reputation of the involved tributed in the system, each of them collects and
party using the following algorithm: aggregate reputation reports centrally.
250
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
form a Virtual Organization, then it’s important satisfaction is calculated in terms of Utility Gain
for them to choose the most appropriate partner. (UG). UG for bad and intermittent providers is less
This model is built upon the probability theory. then the good and average ones. The model has
TRAVOS equips an agent (the trustier) with three incorporated the factor of dynamism in order to
methods for assessing the trustworthiness of an- address the changing environment of open MAS.
other agent (the trustee). The dynamism in FIRE is based on population of
First, the trustier can make the assessment agents that cannot exceed a particular threshold
based on the direct interactions it had with the and location of agents with an assumption that
trustee. Second, the trustier can assess the trust- some agents cannot change their location.
worthiness of the trustee based on the opinions
provided by others in the system. Third, the trustier
can assess the trustworthiness of another based on SWARM INTELLIGENCE
a combination of the direct interactions with and
the reputation of the trustee. TRAVOS considers Swarm intelligence (SI) is an artificial intelligence
the behaviour of an agent as a probability that it technique based around the study of collective
will participate in a successful interaction and behaviour in decentralised, self-organised, sys-
a probability that it will perform an unsuccess- tems. The expression “swarm intelligence” was
ful interaction (untrustworthy behaviour). This introduced by Beni & Wang in 1989(Beni, Wang
abstraction of agent behaviour means that in this 1989), in the context of cellular robotic systems.
model the outcome of an interaction is a binary SI systems are typically made up of population
value (successful or not). of agents interacting locally with one another and
the environment. There is no central authority to
The FIRE Model dictate the behaviour of the agents. In fact, the
local interactions among agents lead to emergence
The FIRE model uses wide variety of sources to of a global behaviour. Examples of systems like
compute the reputation value of an agent. These this can be found in nature, including ant colonies,
sources include IR, RR, WR and CR. IR is based bird flocking, animal herding, bacteria molding
on an agent’s personal experience, while WR and fish schooling.
and CR are reputation information reported by When insects work together collaboratively
neighboring agents. RR is a role-based reputa- there is no apparent communciation between
tion and involves some rule-based evaluation them. Infact, they utilize envoirnment as carrier
of an agent’s repute. The model has considered of information. They make certain changes in
a scenario of producers and consumers with an the envoirnement that are sensed by the insects .
assumption of only one type of service availabil- There are two popular techniques in Swarm
ity. The producers (Providers) are categorized as intelligence, i.e Ant Colony Optimization (ACO)
good, ordinary, intermittent and bad, depending and Particle Swarm Optimization (PSO). Ant
upon the quality of service they are rendering. colony algortihm mimic the behvior of simple
The Intermittent and Bad providers are the most ants trying to locate their food source. Ants while
random ones. If consumer agent needs to use the following a certain path lay down a special chemi-
service, it can contact the environment to locate cal called pheromone, other ants follow the same
nearby provider agents. The consumer agent will path by sensing the pheromone concentration.
then select one provider from the list to use its Stigmergy is altering the state of environment
services. The selection process depends upon in a way that it will effect the behavior of others
the reputation model of the agent. Consumer’s for whom environment acts as a stimulus. “Ant
251
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Colony Optimization” is based on the observation associated with the best solution (fitness) it has
that ants will find the shortest path around an ob- achieved so far. (The fitness value is also stored.)
stacle separating their nest from a target such as a This value is called pbest. Another “best” value
piece of candy simmering on a summer sidewalk. that is tracked by the particle swarm optimizer
As ants move around they leave pheromone is the best value, obtained so far by any particle
trails, which dissipate over time and distance. in the neighbors of the particle. This location is
The pheromone intensity at a spot, that is, the called lbest. When a particle takes all the popula-
number of pheromone molecules that a wander- tion as its topological neighbors, the best value
ing ant might encounter, is higher either when is a global best and is called gbest(Covaci 1999).
ants have passed over the spot more recently or
when a greater number of ants have passed over Implementation of SI
the spot. Thus ants following pheromone trails
will tend to congregate simply from the fact Agent Behavior
that the pheromone density increases with each
additional ant that follows the trail. By exploita- It is very hard to predict the behaviour of agents
tion of the positive feedback effect, that is, the in an open multiagent systems that is character-
strengthening of the trail with every additional ant, ized by high degree of uncertainty. In our system
this algorithm is able to solve quite complicated we have limited our model to the successful or
combinatorial problems where the goal is to find unsuccessful interaction among the agents. Swarm
a way to accomplish a task in the fewest number based model considers behaviour of an agent as
of operations. Research on live ants has shown its willingness to carry out a particular interaction.
that when food is placed at some distance from This behaviour is then propagated throughout
the nest, with two paths of unequal length leading the system. If the agent is unable to fulfil certain
to it, they will end up with the swarm following interaction, it automatically gets isolated and is
the shorter path. no longer called for the interaction in future.
If a shorter path is introduced, though, for in-
stance, if an obstacle is removed they are unable Basic Concept
to switch to it. If both paths are of equal length,
the ants will choose one or the other. If two food Our model is based upon the swarm intelligence
sources are offered, with one being a richer source paradigm. Ant colony algorithm that is the subset
than the other, a swarm of ants will choose the of Swarm Intelligence is utilized in our model
richer source if a richer source is offered after the (Figure 1). Ants and insects communicate with
choice has been made, most species are unable to one another indirectly by making changes in the
switch but some species are able to change their environment a process called as stigmergy. Ants
pattern to the better source. If two equal sources while following a particular path towards the
are offered, an ant will choose one or the other food source lay down a special chemical called as
arbitrarily. Particle Swarm Optimization is the pheromone. If it’s the valid path, more ants follow
technique developed by Dr. Eberhart and Dr. the same path thereby increasing the pheromone
Kennedy (Eberhart & Kennedy 2001) inspired by concentration. So the newer generation of ants
social behavior of bird flocking or fish schooling. automatically given an option automatically fol-
In PSO, the potential solutions called particles fly low the path with higher pheromone concentration.
through the problem space by following the cur- In our model there are basically two types of
rent optimum particles. Each particle keeps track ants: unicast and broadcast ants. If no pheromone
of its coordinates in the problem space, which are concentration is available then the broadcast ant
252
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
is sent to all the agents in the MAS. The ant that Modeling Peer Experience
is able to find the valid path towards the food This component basically involves the reputation
source then returns back to the source to complete information from the peers. The peers maintain
the process unicast ants are sent to the path in special tables that contain pheromone information
order to accumulate the pheromone concentration. of its neighbors. If the agent does not contain any
Each agent in our model is equipped with cer- direct experience information. It has to utilize the
tain modules. Reputation calculation requires basi- information from the peers or neighbors. For do-
cally two sources of information that are agent’s ing so, it broadcasts the request to the neighbors
direct experience and agent’s peers or neighbours and records a value against pheromone variable
experience. Our model consists of three modules for the replying agent.
that serve to maintain these sources of information:
namely, experience manager, recommendation Updating
manager and the reputation manager. Each one The changing levels of pheromone concentration
of these is discussed in detail below. capture the dynamic behaviour of agents in Swarm
Intelligence based system. If a path is no longer
Modeling Direct Experience followed by the agents the pheromone level starts
weakening with the passage of time.
This component basically involves the personal Now the pheromone concentration might be
experiences of the agent with the intended agent. high at some other path. If an agent leaves the
The experience is quantified in terms of the phero- current best path, then the agent may opt for
mone value. If the agent interacted with certain some other path based upon available reputation
agent X and that interaction was successful, the information and comes up with another path to-
experiencing agent will automatically record a wards agents that requires evaluation. Based on
value against pheromone variable. This value is this phenomenon the updating components of the
used as agents own personal experience with the model carry out the process. The final updating
agent under consideration. value is calculated at reputation update component.
253
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
254
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
The Reputation Distribution Phenomenon From the above discussion we can further
We describe the reputation distribution phenom- elaborate the functionality of the Experience
enon with the help of connected nodes scenario, manager and the Recommendation manager.
where each node is connected with other, and is Experience Manager contains personal infor-
also subject to dynamism. In this condition, if the mation of each node with its immediate connected
node or agent is directly connected with the target agent or node.
agent, then the peer experience is equal to the Recommendation manager basically captures
personal experience. But if it is not directly con- the learning behavior of the ACO; where by reputa-
nected with the target node or agent the information tion of the target agent or node is evaluated on the
of the neighbours or peers is utilized in order to basis of personal experience of all the intermediate
compute the reputation value of the agent. Each nodes with one another, thus making the target
agent maintains a table containing the pheromone agent as the best possible source of the service.
information of the immediate connected nodes to If for example the target under observation
it. The node learns through the personal experience moves away as it’s the characteristic of the highly
of the neighbouring nodes. Thus the two important dynamic environment, the backward message
sources of information in building of the reputa- won’t be generated. Instead, the pheromone in-
tion value of an agent are done in a very novel formation contained at the last connected node or
manner. This phenomenon in fact also addresses agent would go on decreasing with the passage of
issue of distribution of reputation information. time, thus this information would be propagated
The two types of messages are generated, called to the source agent or node that the current service
as forward message and the backward message. provider is no longer available as its reputation
The forward message invokes the transmitter has decreased. So the source agent or node has
module of the nodes which in turn calls the collec- to now consult any other group of surrounding
tor module to find if the table contains information neighbours or group of nodes in order to evaluate
for the target agent. If yes, the transmitter module and propagate the reputation information, starting
immediately generates the backward message. with the broadcast message.
This backward message basically retraces the Through literature review we found that latest
same path updating the pheromone information model FIRE incorporated personal, certificate
of all the intermediate nodes maintained at the and the witness reputation information. About
collector module. The processor module carries certificate information it is the node or neighbor
out the calculations based upon hop count and that the service providing agent delegates its self
thus assigns the pheromone value before and also to the evaluating agent. It is just like the reference
after the updating processes (Figure 4). we mention in our CV. The biggest dilemma of
this approach is that certificate reputation can
255
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
sometimes propagates wrong full information. maintained at each node. Once a backward mes-
If we look at the ant colony algorithm we come sage is accepted by the agent, it deletes its id form
across the conclusion that personal experience that the table to avoid duplication. Also, if it is unable
is the most highly rated reputation information is to receive the backward message within a speci-
used to build the witness or peer information. In fied period of time, its id is treated as stale.
addition, the certificate reputation comes into play
in a different fashion with the generation of the
backward message. If the target agent replies back ThE ALGORIThM
and generates the backward message, it is in fact
delegating a certificate to the last connected agent The algorithm captures the phenomena of repu-
or node to be used by other neighboring nodes. tation distribution using ant colony algorithm.
An important concern in trust and reputation The collector module basically searches for the
models is how to capture the wrongful or deceitful pheromone information of the requested agent.
behaviour of the agents. In our case the deceitful If found, it generates the backward message;
action can be in the form of generation of illegal while in other case, it repeats the same process
backward message, thereby attracting the source and generates the broadcast message for all the
agent to the wrong path towards the destination agents in search of the pheromone information.
agent. Proposed by (Lam and Leung 2004) where Processing module receives the reputation
by a unique id is assigned to each forward ant, values from the collector and carries out the re-
backward ant with the same id is generated that quired calculations as mentioned in the algorithm.
retraces the given path and only the recorded id at Transmitter module responses back to any request
each node is entertained as the backward message generated for the agent to acquire the reputation
so the problem of lie in reputation mechanism can value.
be easily dealt. Lie detection has been one of the
most important aspects of any reputation model.
However, our algorithm can solve this problem Collector Module
and certain other issues related to false reputation {
propagation (Figure 5). Check if the requested
For example the legitimate way to reach to node’s pheromone info available
agent 4 is through 1. But somehow agent 2 starts If available then
generating wrong backward messages making Call Backward()
agent 0 to realize agent 4 through it. The problem Else
can be solved if nodes only entertain the backward Call Broadcast
messages with the id’s recorded in special tables ()
}
Transmitter Module
Figure 5. Lie detection example
{
Call Collector Module
}
Broadcast()
{
Send hello message in
search of pheromone information
to all the agents in the domain
256
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
257
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Every node holds a pheromone table for all This inherent uncertainty in Manets makes it a
other nodes of the network. Each pheromone very suitable candidate for the evaluation of the
table holds a list of table entries containing all proposed reputation model. The reputation model
the connected nodes of the current node. for such environment needs to posses following
properties.
Simulation Results
• It should take variety of sources of infor-
The simulation results are computed in terms of mation to have a more robust and reliable
the average hop counts in various scenarios. The reputation value.
Reputation Model how ever shows its efficiency • Since there is no central authority in the
in different scenarios in terms of the Utility Gain system each agent should be equipped with
(UG) value. Utility Gain can be defined as the level the reputation model in order to interact
of satisfaction an agent gains after interacting with with one another.
any other agent to consume certain services. UG
is inversely proportional to the average number The swarm intelligence based model not only
of hop counts. provides a mechanism to calculate the reputation
value of an agent but in fact also provides the
Experimental Test Bed mechanism that is utilized to disseminate that
reputation information to all the agents in the
To evaluate the proposed reputation model for system, under high dynamism.
multiagent systems, we required an open system We have evaluated our system on the basis of
whose dynamic nature is captured by the model. hop counts only. The basic phenomenon behind
Therefore, we chose mobile adhoc network as a our algorithm is the concentration level of phero-
case study to find the effectiveness of our model. mone. As the level of trust upon agents increases
An adhoc network involves collection of mobile so as the concentration of the pheromone variable
nodes that are dynamically and arbitrary located. contained by every agent in the system. The repu-
The interconnections between the nodes change tation value degrades by the weakening level of
on continual basis. pheromone variable. The number of hop counts
Each node in Mobile adhoc networks(Manets) efficiently captures this thing. As the reputation
decides about its route. In fact, there are no des- gets built around certain agents, the probability
ignated routers and nodes forward packets from of selecting them for any future service becomes
node to node in multi hop fashion. Nodes move- higher. Therefore, the same destination that would
ment implies that current routes become invalid have been achieved by trial and error can now be
in future instances (Huynh, Jennings, Shadbolt consulted for any service efficiently and effec-
2006, Patel 2007). In such an environment quality tively by the inherent capability of the proposed
of Service (QoS) is a big issue in terms of secure algorithm. Previous trust and reputation models
paths, bandwidth consumption, delays and etc. have incorporated various methods to measure
Our reputation model can be applied to Manets the UG attained by agents in the system that de-
in order to show its applicability and robustness. picts the level of satisfaction that an agent gains
The nodes in Manets could be thought as agents in from another after the interaction. We have made
mutliagent systems. Let’s take a simple problem of number of hop counts the very basis of the UG
route discovery in Manets with reputation mecha- in our proposed scheme. As the number of hops
nism. Mobile adhoc networks are open system decreases in different scenarios UG gets increased
and constantly undergo change in agent topology. thus they are inversely proportional to each other.
258
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Since we are utilizing special case of Manets, When no ant algorithm is utilized, meaning
for evaluation we take average hop counts as the that there is no learning capability in the process,
measure of effectiveness of our model. The lesser thus in order to utilize the same services, agents
the number of hops required in any particular have to seek more agents for further information,
scenario, the more adaptable our system is under thereby increasing the average hop counts.
certain conditions. The adaptability of the system to the dynamic
We tested the scenario with different param- environment is captured by making certain nodes
eters. For example, first we evaluated the system unavailable. Let’s say we have removed node 5,
with ant net algorithm set to off (Figure 6). The 9, and 11 from the system which no longer can
average hop counts found are 5.08. be utilized to propagate the reputation informa-
As compared to when the ant algorithm is set tion. Under such circumstances, the average hop
to on in the same scenario, the average hop counts counts comes up to be 4.89. As compared to the
becomes 4.72 (Figure 7). The reduction in the same situation when the ant algorithm is switched
number of hop counts is the result of the phero- off the average hop count comes up to be 5.43
mone variable value. The agents in the system Capturing dynamism in the open environment
learn about each other’s reputation based on the is one of the important targets of the interactions
previously discussed phenomena. among agents in the open multi agent systems
(Figures 8 and 9). However, even the previous
259
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
recent model FIRE was unable to accurately when swarm algorithm is utilized. The average
capture the highly dynamic environment. When hop counts is less showing the system is able to
certain nodes (agents) are removed from the system adapt itself to the changing conditions (Figure 10).
the average number of hop counts increases if no Looping is an important issue in case of Manets
swarm algorithm utilized as compared to the fact in order to detect and correct the duplicate infor-
260
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
mation travelling in the network. If looping is The above results show that the swarm intel-
allowed the average hop counts results to be 6.69; ligence based reputation mechanism is able to
while if it is set to off the average hop counts learn with the changes taking place in the system.
computes to be 5.43 (Figures 11 through 14). And can quickly adapt itself to the changing
261
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
conditions as opposed to the system when no ant the effectiveness of the proposed model in terms
net algorithm is utilized. of utility gain (Figures 15 and 16).
Compared to previous recent work in the do- From the charts it is visible that while ant al-
main of reputation models, the FIRE model states gorithm is utilized the utility gain increases with
that in order to have a robust reputation the model the same number of interactions as opposed to
should be able to gather information from diverse the fact when no ant algorithm is utilized. Simi-
sources. FIRE model has incorporated personal larly UG in case of changing conditions also shows
experience, witness reputation, rule based and a marked increase showing that the system is able
the certificate reputation. The swarm intelligence to learn and adapt itself. Comparing our results
reputation model inherently incorporates different with the research aims set, we find that the pro-
types of information in order to come up with the posed reputation model truly captures the dyna-
final reputation value of the agent. On top of that, mism in the agent environment. The reputation
these sources of information have learning capa- information is not held centrally and exists in
bility and can adapt themselves to the changing distributed form and finally captures information
nature of open MAS. This fact is apparent from from various sources in order to compute the final
the simulation results. Thus we can further depict reputation value. In previous work, weights were
262
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Number of interactions 21 41 61 81
UG 4 4.2 4.5 4.4
%Gain 5% 7.1% -2.2%
Number of interactions 21 41 61 81
UG 6.3 6.8 6.5 6.4
%Gain 7% -4.4% -1.5%
assigned to the different sources of information, the system finding the average hop counts; at 21,
particularly information gained from the neighbors 41, 61 and 81 instances of interactions. The data
is weighed lower due to risk associated with it. is given in Table 1.
In Swarm based system, the weight assigned to As compared to the percentage change in the
the peer information is very logical in a way if Utility Gain of the interactions among agents in
the agent doesn’t reply back in particular period the system using FIRE model to compute the
of time, the agent will loose its reputation value reputation value of the agents given in Table 2.
in the form of evaporating pheromone concentra- By analyzing the percentage change in the
tion, thereby making it less attractive for the agents utility gains of the two systems given same num-
to consult for any information. ber of interactions, we find that Swarm based
system is more adaptable to the dynamism in the
Over All Performance environment as compared to the previous FIRE
model. Thus from above results we can deduce
We evaluated the over all performance of the that utility gain based on hope counts yields more
proposed model by computing the percentage gain value as compared to the Utility Gained by agents
in special case of dynamism. And compared the utilizing FIRE model to compute the reputation
result with that of the percentage gain in the FIRE values.
model. We computed our data from the simulation
results obtained after removing certain agents from
263
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
264
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Lam, K., & Leung, H. (2004). An Adaptive Boukerche, A., & Li, X. (2005). An Agent-based
Strategy for Trust/ Honesty Model in Multi-agent Trust and Reputation Management Scheme for
Semi- competitive Environments. In Proceedings Wireless Sensor Networks. In Global Telecom-
of the 16th IEEE International Conference on Tools munications Conference, 2005. GLOBECOM
with Artificial Intelligence(ICTAI 2004) ‘05. IEEE.
Patel, J. (2007). A Trust and Reputation Model For Fullam, K., Klos, T., Muller, G., Sabater, J.,
Agent-Based Virtual Organizations. Phd thesis in Schlosser, A., Topol, Z., et al. (2005). A Specifi-
the faculty of Engineering and Applied Science cation of the Agent Reputation and Trust (ART)
School of Electronics and Computer Sciences Testbed. In Proc. AAMAS.
University of South Hampton January 2007.
Gunes, M., Sorges, U., & Bouazizi, I. (2002).
Ramchurn, S. D., Huynh, D., & Jennings, N. ARA- the ant Based Routing Algorithm for MA-
R. (2004). Trust in multiagent Systems. The NETs. International workshop on Adhoc network-
Knowledge Engineering Review, 19(1), 1–25. ing (IWAAIN 2002) Vancouver British Columbia
doi:10.1017/S0269888904000116 Canada, August 18-21 2002.
Schlosser, A., Voss, M., & Bruckner, L. (2004). Hughes, T., Denny, J., & Muckelbauer, P. A.
Comparing and evaluating metrics for reputation (2003). Dynamic Trust Applied to Ad Hoc Net-
systems by simulation. Paper presented at RAS- work Resources. Paper presented at 6th Work shop
2004, A Workshop on Reputation in Agent Societ- on Trust, Privacy Deception and Fraud In Agent
ies as part of 2004 IEEE/WIC/ACM International Societies Melbourne 2003
Joint Conference on Intelligent Agent Technology
Kagal, L., Cost, S., Finin, T., & Peng, Y. (2001).
(IAT’04) and Web Intelligence (WI’04), Beijing
A Framework for Distributed Trust Management.
China, September 2004.
Paper presented at the Second Workshop on Norms
Zacharia, G., & Maes, P. (2000). Trust manage- and Institutions in MAS, Autonomous Agents.
ment through reputation mechanisms. Applied
Kennedy, J., & Eberhart, R. C. (2002). Book
Artificial Intelligence Journal, 14(9), 881–908.
Review Swarm Intelligence. Journal of Genetic
doi:10.1080/08839510050144868
Programming and Evolvable Machines, 3(1).
Li, X., Hess, T.J., & Valacich, J.S. (2006). Us-
ing Attitude and Social Influence to Develop an
ADDITIONAL READING
Extended Trust Model for Information Systems.
Abdui-Rahman, A., & Hailes, S. (1997). A Dis- Database for advances in Information Systems.
tributed Trust Model. Paper presented at the 1997 Liu, J., & Issarny, V. (2004). Enhanced Reputa-
New Security Paradigms Workshop. Langdale, tion Mechanism for Mobile Ad Hoc Networks. In
Cumbria UK. ACM. Proceedings of iTrust 2004, Oxford UK.
Amir Pirzada, A., Datta, A., & McDonald, C. Mars, S. P. (1994). Formalising Trust as a Com-
(2004). Trusted Routing in Ad-hoc Networks using putational Concept. Department of Computing
Pheromone Trails. In Congress of Evolutionary Science and Mathematics University of Stirling
Computation, CEC2004. IEEE. April 1994 PhD thesis
Botely, L. (n.d.). Ant Net Simulator. University
of Sussex, UK.
265
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
Marsh, S., & Meech, J. (2006). Trust in Design. Song, W. (2004). Neural Network-Based Reputa-
National Research Council Canada Institute of tion Model in a Distributed System. In Proceed-
Information Technology. ings of the IEEE International Conference on
E-Commerce Technology. IEEE.
Nurmi, P. (2005). Bayesian game theory in prac-
tice: A framework for online reputation systems. Teacy, W.T.L., Huynh, T.D., Dash, R.K., Jennings,
University of Helsinki Technical report, Series of N.K., & Patel, J. (2006). The ART of IAM: The
Publications C, Report C-2005-10. Winning Strategy for the 2006 Competition.
Pujol, J. M., Sangüesa, R., & Delgado, J. (2002). Teacy, W.T.L., Patel, J., Jennings, N.R., & Luck,
Extracting Reputation in Multi Agent Systems by M. (2005). Coping with Inaccurate Reputation
Means of Social Network Topology. In Proceed- Sources: Experimental Analysis of a Probabilistic
ings of the First International Joint Conference Trust Model. AAMAS’05.
on Autonomous Agents and Multiagent Systems:
Theodorakopoulos, G., & Baras, J. S. (2006).
Part 1 (pp. 467-474). ACM.
On Trust Models and Trust Evaluation Metrics
Sabater, J. (2003). Trust and Reputation for Agent for Ad Hoc Networks. IEEE Journal on Selected
Societies. PhD thesis, University Autonoma de Areas in Communications, 24(2). doi:10.1109/
Barcelona. JSAC.2005.861390
Sabater, J. (2004). Toward a Test-Bed for Trust Xiong, L., & Liu, L. (2003). A Reputation-Based
and Reputation Models. In R. Falcone, K. Bar- Trust Model for Peer-to-Peer eCommerce Com-
ber, J. Sabater, & M. Singh (Eds.), Proc. of the munities. In Proceedings of the IEEE International
AAMAS-2004 Workshop on Trust in Agent Societ- Conference on E-Commerce (CEC’03).
ies (pp. 101-105).
Yamamoto, A., Asahara, D., Itao, T., Tanaka,
Schlosser, A., & Voss, M. (2005). Simulating Data S., & Suda, T. (2004). Distributed Pagerank: A
Dissemination Techniques for Local Reputation Distributed Reputation Model for Open Peer-
Systems. In Proceedings of the Fourth Interna- to-Peer Networks. In Proceedings of the 2004
tional Joint Conference on Autonomous Agents International Symposium on Applications and
and Multiagent Systems (pp. 1173-1174). ACM. the Internet Workshops (SAINTW’04).
Sierra, C., & Debenham, J. (2005). An Informa- Zheng, X., Wu, Z., Chen, H., & Mao, Y. (2006).
tion-based model for trust. AAMAS 05. Utrecht Developing a Composite Trust Model for Multi
Netherlands. agent Systems. In Proceedings of the Fifth Inter-
national Joint Conference on Autonomous Agents
Smith, M.J., & desJardins, M. (2005).A Frame-
and Multiagent Systems (pp. 1257-1259). ACM.
work for Decomposing Reputation in MAS in to
Competence and Integrity. AAMAS’05.
266
267
Chapter 15
Exploitation-Oriented
Learning XoL:
A New Approach to Machine Learning
Based on Trial-and-Error Searches
Kazuteru Miyazaki
National Institution for Academic Degrees and University Evaluation, Japan
ABSTRACT
Exploitation-oriented Learning XoL is a new framework of reinforcement learning. XoL aims to learn
a rational policy whose expected reward per an action is larger than zero, and does not require a so-
phisticated design of the value of a reward signal. In this chapter, as examples of learning systems that
belongs in XoL, we introduce the rationality theorem of profit Sharing (PS), the rationality theorem of
reward sharing in multi-agent PS, and PS-r*. XoL has several features. (1) Though traditional RL sys-
tems require appropriate reward and penalty values, XoL only requires an order of importance among
them. (2) XoL can learn more quickly since it traces successful experiences very strongly. (3) XoL may be
unsuitable for pursuing an optimal policy. The optimal policy can be acquired by the multi-start method
that needs to reset all memories to get a better policy. (4) XoL is effective on the classes beyond MDPs,
since it is a Bellman-free method that does not depend on DP. We show several numerical examples to
confirm these features.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Exploitation-oriented Learning XoL
important applications (Merrick et al., 2007), gen- ences very strongly. (3) XoL may be unsuitable
erally speaking, it is difficult to design RL systems for pursuing the optimality. It can be guaranteed
to fit on a real world problem. We think that the by the multi-start method (Miyazaki et al., 1998)
following two reasons concern with it. In the first, that resets all memories to get a better policy. (4)
the interaction will require many trial-and-error XoL is effective on the classes beyond MDPs
searches. In the second, there is no guideline how such that the Partially Observed Markov Deci-
to design the values of reward signals. Though they sion Processes (POMDPs), since it is a method
are not treated as important issues on theoretical that does not depend on DP called a Bellman-free
researches, they are able to be a serious issue in method (Sutton and Barto, 1998).
a real world application. Especially, if we have In this chapter, we focus on the POMDPs
assigned inappropriate values to reward signals, environments where the number of types of a
we will receive an unexpected result (Miyazaki reward is one. As examples of learning systems
and Kobayashi, 2000). We know the Inverse Rein- that belong in XoL at the environments, we intro-
forcement Learning (IRL) (Ng and Russell, 2000; duce the rationality theorem of PS, the rationality
Abbeel and Ng, 2005) as a method related to the theorem of PS in multi-agent environments, and
design problem of the values of reward signals. If PS-r*. We show several numerical examples to
we input our expected policy to the IRL system, support how to use these methods.
it can output a reward function that can realize
the policy. IRL has several theoretical results, i.e.
apprenticeship learning (Abbeel and Ng, 2005) PROBLEM FORMULATIONS
and policy invariance (Ng et at., 1999).
On the other hand, we are interested in the Notations
approach where reward signals are treated in-
dependently and do not require a sophisticated Consider an agent in some unknown environment.
design of the values of them. Furthermore, we aim The agent senses a set of discrete attribute-value
to reduce the number of trial-and-error searches pairs and performs an action in some discrete
through strongly enhancing successful experi- varieties. The environment provides a reward
ences. We call it Exploitation-oriented Learning signal to the agent as a result of some sequence
(XoL). As examples of learning systems that can of an action. We denote the sensory inputs as x,
belong in XoL, we know the rationality theorem y,… and actions as a, b,… . A sensory input and
of Profit Sharing (PS) (Miyazaki et al., 1994), the an action constitute a pair that is termed as a rule.
Rational Policy Making algorithm (Miyazaki et al., We denote the rule “if x then a” as xa. In Profit
1998), the rationality theorem of PS in multi-agent Sharing (PS), a scalar weight, that indicates the
environments (Miyazaki and Kobayashi, 2001), importance of the rule, is assigned to each rule.
the Penalty Avoiding Rational Policy Making The weight of the rule xa is denoted as wxa . The
algorithm (Miyazaki and Kobayashi, 2000) and function that maps sensory inputs to actions is
PS-r* (Miyazaki and Kobayashi, 2003). termed a policy. We call a policy rational if and
XoL has several features. (1) Though tradi- only if the expected reward per an action is
tional RL systems require appropriate values of larger than zero. Furthermore, a useful rational
reward signals, XoL only requires an order of policy is a rational policy that is not inferior to
importance among them. In general, it is easier the random walk (RA) where the agent selects an
than designing their values. (2) XoL can learn action based on the same probability to every
more quickly since it traces successful experi- action in every sensory input. The policy that
268
Exploitation-oriented Learning XoL
resented by a state transition diagram in Figure 1. rW ··· ri ··· r2 · r1 ), where W denotes the reinforce-
The node with a token denotes a sensory input at ment interval of the episode.
time t. Three rules match the sensory input. Since
the state transition is not deterministic, selection Properties of the Target
of the same rules does not always lead to the same Environments
state. The branching arcs indicate such cases. We
term a part of the state transition diagram around We focus on the POMDPs environments where
one sensory input as a conflict structure. Figure the number of types of a reward is one. In POM-
1 is an example of it. DPs, the agent may sense different states on an
Figure 2a) is an environment consisting of environment as the same sensory input. We call
three sensory inputs, x, y and z, denoted by circles. the sensory input a Partially Observable (PO)
Two actions, a and b, can be selected in each sensory input.
sensory input. An arrow means a state transition We recognize that the learning in POMDPs
by execution of an action described on the arrow. must overcome two deceptive problems (Mi-
We term the rule sequence selected between yazaki et al., 1998). We term the indistinguishable
two rewards, or an initial sensory input and a of state values, that are assigned for each state on
reward, as an episode. For example, when the the environment in POMDPs, as a type 1 confu-
agent selects xb, xa, ya, za, yb, xa, za and yb in sion. Figure 3a) is an example of the type 1 con-
Figure 2a), there exist two episodes (xb·xa·ya·za·yb) fusion. In this example, the state value (v) is es-
and (xa·za·yb), as shown in Figure 2b). We term timated by the minimum number of steps required
the subsequent episode as a detour when the to obtain a reward1. The values for the states 1a
sensory input of the first selection rule and the and 1b are 2 and 8, respectively. Although the
sensory output of the last selection rule are the state 1a and 1b are different states, the agent
same although both rules are different. For ex- senses them as the same sensory input 1 hatched
ample, the episode (xb·xa·ya·za·yb) has two de- in Figure 3. If the agent experiences the state 1a
269
Exploitation-oriented Learning XoL
Figure 2. a) An environment consisting of three sensory inputs and two actions. b) An example of an
episode and a detour
Figure 3. Examples of type 1(a) and type 2(b) Figure 4. Three classes of the target environments
confusions
270
Exploitation-oriented Learning XoL
271
Exploitation-oriented Learning XoL
Figure 6. The problem used in the experiment Figure 7. The state transition diagram used in
the example
Application to a Maze-
like Environment
Setting
272
Exploitation-oriented Learning XoL
The learning rate of QL is 0.02 and its discount senses the environment and performs an action.
rate is 0.8. Though QL guarantees the acquisition The agent senses a set of discrete attribute-value
of the optimal policy, it requires numerous trials. pairs and performs an action in M discrete variet-
In this example, QL requires more initial fuel than ies. We denote agent i’s sensory inputs as xi, yi,⋯
the PS to acquire a rational policy. Furthermore, and its actions as ai, bi,⋯.
though it is not shown in Figure 8, QL requires When the n’th agent (0<n’≤n) has a special
5000 liters to guarantee the acquisition of the sensory input on condition that (n’-1) agents have
optimal policy. special sensory inputs at some time step, the n’th
agent obtains a direct reward R (R>0) and the
other (n-1) agents obtain an indirect reward μR
PROFIT ShARING IN MULTI- (μ≥0). We call the n’th agent the direct-reward
AGENT ENVIRONMENTS agent and the other (n−1) agents indirect-reward
agents. We do not have any information about
Approaches to Multi- the n’ and the special sensory input. Furthermore,
agent Environments nobody knows whether (n-1) agents except for the
n’th agent are important or not. A set of n’ agents
In this section, we consider n (n>1) agents. At that are necessary for obtaining a direct reward
each discrete time step, an agent i (i=1,2,…n) is is termed the goal-agent set. In order to preserve
selected from n agents based on the selection the rationality in the multi-agent environments
probabilities Pi (Pi > 0, ∑ i =1 Pi = 1 , and it
n
all agents in a goal-agent set must learn a rational
273
Exploitation-oriented Learning XoL
ωy a = ωy a + 1
µR , respectively. not know the number of W in general. However, in
1 1 1 1 M
practice, we can set μ=0, if a reinforcement interval
Rationality Theorem of PS in is larger than some number that is determined in
the Multi-agent Environments advance, or the assumption of the initial sensory
input for a goal-direct agent has been broken. If
In order to preserve the rationality in the multi- we set L=M-1 and W0 =W where indirect-reward
agent environments discussed in the previous agents have the same reinforcement interval of
section, all irrational rules in a goal-agent set must direct-reward agent, Equation (3) is simplified
be suppressed. On the other hand, if a goal-agent as follows;
set is constructed by the agents that all irrational
rules have been suppressed, we can preserve the 1
m= W
. (4)
rationality. Therefore, we can derive the necessary (M − 1)(n − 1)
and sufficient condition about the range of μ to
suppress all irrational rules in some goal-agent set.
274
Exploitation-oriented Learning XoL
Application to Roulette- two agents learn the policy ‘move left in the initial
like Environments position’ or ‘move right in the initial position, and
move left in the right side of the initial position’, it
Setting is an irrational. When the optimal policy does not
have been destroyed in 100 episodes, the learning
Consider the roulette-like environments in Figure is judged to be successful. We will stop the learn-
11. There are 3 and 4 learning agents in the roulette ing if agent 0, 1 and 2 learn the policy ‘move left
a) and b), respectively. The initial position of an in the initial position’ or the number of actions is
agent i(Ai) is Si. The number shown in the center larger than 10 thousand. Initially, we set W=3. If
of both the roulettes (from 0 to 8 or 11) is given the length of an episode is larger than 3, we set
to each agent as a sensory input. There are two μ=0. From Equation (4), we set μ<0.0714… for
actions for each agent; move right (20% failure) the roulette a) and μ<0.0333… for the roulette b)
or move left (50% failure). If an action fails, the to preserve theorem 2.
agent cannot move. There is no situation where
another agent gets the same sensory input. At each Results and Discussion
discrete time step, Ai is selected based on the
selection probabilities Pi (Pi > 0, ∑ i =1 Pi = 1) .
n
We show the quality, that is evaluated by acquiring
times of an irrational or the optimal policies in a
(P0,P1,P2) is (0.9,0.05,0.05) for the roulette a),
thousand different trials where random seeds are
and (P0,P1,P2,P3) is (0.72,0.04,0.04,0.2) for the
changing, and the speed, that is evaluated by total
roulette b). When Ai reaches the goal i(Gi), Pi sets
action numbers to learn a thousand the optimal
0.0 and Pj (j≠i) are modified proportionally.
policies in Figure 12. Figure 12a) and 1b) are
When Ar reaches Gr on condition that Ai (i≠R)
the results of the roulette a) and b), respectively.
have reached Gi, the direct reward R(=100.0) is
Figure 13a) and b) are details of the speeds in the
given to AR and the indirect rewards 𝜇R are given
roulette a) and b), respectively.
to the other agents. When some agent obtains the
Though theorem 2 satisfies the rationality, it
direct reward or Ai reaches Gj (j≠i), all agents
does not guarantee the optimality. However, in
return to the initial position shown in Figure 11.
both the roulettes, the optimal policy always has
The initial weights for all rules are 100.0.
been learned beyond the range of theorem 2.
If all agents learn the policy ‘move right in
any sensory input’, it is the optimal. If at least
275
Exploitation-oriented Learning XoL
In the roulette a), μ=0.3 makes the learning the other hand, if we set μ≥0.3, there is a case that
speed the best (Figure 12a)), Figure 13a)). On irrational policies have been learned. It is an
the other hand, if we set μ≥0.4, there is a case important property of the indirect reward that the
that irrational policies have been learned. For learning qualities exceed those of the case of μ=0.
example, consider the case that A0, A1 and A2 in Though theorem 2 only guarantees the rational-
the roulette a) get three rule sequences in Figure ity, numerical examples show that it is possible to
14. In this case, if we set μ=1.0, A0, A1 and A2 improve the learning speeds and qualities.
approach to G2,G0 and G1, respectively. If we set
μ<0.0714..., such irrational policies do not have
been learned. Furthermore, we have improved the ExPANSION OF PS TO POMDPS
learning speeds. Though it is possible to improve
the learning speeds beyond the range of theorem 2, Approaches to POMDPs
we should preserve it to guarantee the rationality
in any environment. The traditional approach to POMDPs is the
In the roulette b), A3 cannot learn anything memory-based approach (Chrisman, 1992; Ma-
because there is no G3. Therefore, if we set μ=0, Callum, 1995; Boutilier et al., 1996) that uses
the optimal policy does not have been learned the history of sensor-action pairs or a model to
(Figure 12b)). In this case, we should use the identify the environmental states corresponding a
indirect reward. Figure 12b) and Figure 13b) show partially observable (PO) sensory input. Although
that μ=0.2 makes the learning speed the best. On the memory-based approach can attain the opti-
276
Exploitation-oriented Learning XoL
mality, it is hardware intensive since it requires If the agent selects action-a in S0 and S1, it can
a huge memory. obtain a reward in 3 steps, that is the minimum
To resolve the problem using the memory- number of steps required to obtain a reward. On
based approach, a stochastic policy (Singh et al., the other hand, if the agent selects the action-b
1994) is proposed, where the agent selects an in S0, it requires 4 steps to obtain a reward that
action based on the non-zero probability of every is the same as RA. When we improved RA using
action in every sensory input in order to escape the the Stochastic Gradient Ascent (SGA) (Kimura et
PO sensory inputs. The simplest stochastic policy al., 1995), that is a type of hill-climbing methods,
is the random walk (RA) that assigns the same the average number of steps required to obtain a
probability to every action. On the other hand, reward was 3.78 in 100 trials. Although 25 trials
the existing RL systems of learning a stochastic in 100 trials were able to improve RA, 73 trials
policy (Williams, 1992; Jaakkola et al., 1994; were the same as RA and the other 2 trials resulted
Kimura et al., 1995; Baird et al., 1999; Konda et in a deteriorated RA.2
al., 2000; Sutton et al., 2000; Aberdeen and Baxter, The hill-climbing methods that were used
2002; Perkins, 2002) are types of hill-climbing previous RL systems resulted in the stochastic
methods. They are often used in POMDPs since policy function being learned well in many cases.
they can attain a local optimum. However, they However, once they converged to a local optimum
cannot always improve RA. Furthermore, we i.e., a policy worse than RA, it is not possible for
know of a case where they change for a policy them to improve it. This implies that they change
worse than RA. for a policy worse than RA. To avoid the fault,
For example, in Figure 15, the average num- we focus on PS that belongs in XoL and is not a
ber of steps required obtain a reward by RA is 4. hill-climbing approach.
277
Exploitation-oriented Learning XoL
278
Exploitation-oriented Learning XoL
not larger than that of RA. On the other hand, we however, it is also possible for them to deteriorate
can derive the following theorem in the POMDPs it. On the other hand, we do not have to select an
where there is a type 2 confusion and the number irrational rule in the non-PO ( Ø PO) sensory in-
of types of a reward is one. puts. Therefore, we only select a rational rule in
them and follow a stochastic policy in the other
Theorem 3 (Comparison between sensory inputs. In particular, we use RA as the
PS-r and RA in the POMDPs) stochastic policy to avoid change for a policy
worse than RA.
The maximum value of the average number of To implement the above idea, it is important
steps to obtain a reward by Policy(PS-r) divided to judge whether sensory inputs are PO. If a sen-
by that of RA in the POMDPs where the number sory input is Ø PO, each transition probability to
of types of a reward is one is given by one of the following sensory inputs by a rule that
has been selected on the sensory input converges
M −1 n to some constant value, even if an action is se-
(1 + )
r lected based on different policies. On the other
r (5)
Mn hand, if a sensory input is PO, the transition prob-
ability will be changed depending on the policy
where n is the upper bound of the number of dif- that is used to reach the sensory input. We aim at
ferent environmental states that are sensed as the judging whether sensory inputs are PO, by com-
same sensory input. ■ paring them with the transition probabilities be-
The proof is presented in Appendix C. tween RA and the other policy.
Theorem 3 is derived from the worst case, The comparison is executed using the χ2-
where an environment is constructed by the most goodness-of-fit test. It only requires transition
difficult environmental structure termed the struc- probabilities to the following sensory inputs by
ture W (see Figure 24 in Appendix C) only. all rules. Therefore, it requires a memory of
Therefore, if there is no structure W in an envi- O(MN2), where M and N are the numbers of ac-
ronment, the behavior of Policy(PS-r) will be tions and sensory inputs, that is less than previous
better than that estimated using Equation (5). memory-based approaches. After the test of a rule
Furthermore, when there is no type 2 confusion based on each RA, and provided the other policy
in some part of an environment and there is an enough sampling, if a transition probability to one
irrational rule in it, its behavior will increasingly of the following sensory inputs is not coincident
improve. between RA and the other policy, we can regard
the sensory input in which the rule can be se-
From PS-r to PS-r* lected as PO. Otherwise, if all the transition prob-
abilities are coincident, the sensory input can be
Improvement of PS-r to fit on POMDPs regarded as Ø PO. It should be noted that although
it is possible not to determine a part of the PO
If we do not identify the environmental states that sensory inputs, it could be resolved by changing
correspond to a PO sensory input, we should use the policy that has been compared with RA.
a stochastic policy to escape from the PO sen- In general, we require several actions in order
sory inputs. Although the simplest stochastic to execute a correct χ2-goodness-of-fit test. When
policy is RA, the existing RL systems to learn a we set the significant level and detection power
stochastic policy cannot always improve RA; as a and 1 − β, respectively, the number of ac-
tions (n) required to achieve the correct test re-
279
Exploitation-oriented Learning XoL
garding a transition by a rule can be statistically In the test mode, if the agent senses an un-
estimated by the following criteria; known sensory input, it returns to the learning
mode. Otherwise, an action is selected based on
2
Policy(PS-r*(test)) that is the policy learned in the
1 u(α) + u(2β ) learning mode. Usually, an action is selected by the
n = (6)
2 sin −1
π1 − sin −1
π2 roulette selection in proportion to r in Policy(PS-
r*(test)). However, we can use another policy
such that some action is not selected to improve
where π1 and π2 are transition probabilities of the
the accuracy of the χ2-goodness-of-fit test.
following sensory input by the rule when RA and
If NofR(test) for a rule that can be selected by
the other policy are used, respectively. In addition,
the sensory inputs, where they do not decide
u()is derived using a normal distribution table,
whether PO or Ø PO is larger than CV, the χ2-
for example, by setting a = 0.05 and β = 0.10, u(
goodness-of-fit test between NofT(learning) and
a ) = 1.960 and u(2β)=1.282.
NofT(test) for the rule is executed. If the result of
the test indicates that they are not coincident, the
The PS-r* Algorithm
sensory input that can select the rule is PO. Oth-
erwise, a Ø PO-judge flag for the rule will be
We propose PS-r* to implement the above idea.
raised. Subsequently, if all the Ø PO-judge flags
We show the algorithm in Figure 17. It is mainly
for rules that can be selected on the same sen-
divided into the learning mode and the test mode.
sory input are raised, the sensory input is Ø PO.
It requires the following five types of memory:
PS-r* is stopped when all NofR(test) are larger
the 1st memory to determine rational rules that
than CV.
are the same as PS-r, Ø PO-judge flags to judge
When we have evaluated or utilized a policy
whether sensory inputs, whose length are the
that is learned by PS-r* termed Policy(PS-r), an
number of a rule, are PO, PO flags to store the
action is selected based on the roulette selection
result of the judgment (PO/ Ø PO/unknown) re-
in proportion to r, where r is set to 0 if it is less
garding the sensory inputs that are unknown
than 1 in the Ø PO sensory inputs. Therefore,
during initialization, the number of ways of select-
Policy(PS-r*) is coincident to RA if all the sen-
ing each rule in two modes (NofR(learning) and
sory inputs are PO. On the other hand, if there
NofR(test)) that requires a memory of O(MN),
exist several irrational rules in the Ø PO sensory
and the number of transitions to each following
inputs, Policy(PS-r*) increasingly better than RA.
sensory input by each rule in two modes
(NofT(learning) and NofT(test)) that requires a
Features of PS-r*
memory of O(MN2).
PS-r* starts from the learning mode. In the
Policy(PS-r*) is coincident to RA in the PO sensory
learning mode, the agent selects an action based
inputs. Therefore, if all sensory inputs are PO, it
on RA to determine all rational rules. Rational
is the most difficult task for PS-r*.
rules are determined using the same algorithm as
On the other hand, Policy(PS-r*) selects a
PS-r. If all NofR(learning) are larger than CV that
rational rule in the Ø PO sensory inputs. Therefore,
is calculated by the upper bound of Equation (6),
if there exist many irrational rules in the Ø PO
the mode is changed to the test one. If we set a
sensory inputs, Policy(PS-r*) is increasingly bet-
= 0.05, β = 0.10, and max|π1−π2 |=0.05, which
ter than RA. They are very important properties
means that the maximum error of estimation of a
of PS-r* that are not guaranteed in PS-r and the
transition probability is 0.05, CV is 2059.09.
280
Exploitation-oriented Learning XoL
existing RL systems when learning a stochastic this environment, the different environmental
policy. states Za,Zb,Zc, and Zd are sensed as the same
PS-r* requires a memory of O(MN2). It is larger sensory input Z. In addition, it adopts the structure
than PS-r, that only requires a memory of O(MN). W , that is the most difficult environmental struc-
However, this value is much smaller than those ture, in the sensory input Z. After the agent selects
of previous memory-based approaches. action-a in sensory input X, it moves the sensory
inputs S1 and X with p and 1-p probabilities, re-
Application to the Most Difficult spectively. States from S1 to Sn are sensed as n
Environmental Structure different sensory inputs. If we adjust n and p, the
average number of steps required to obtain a re-
Setting ward can be changed.
281
Exploitation-oriented Learning XoL
Figure 18. The environment in which PS-r* is compared with RA, PS-r, and SGA
282
Exploitation-oriented Learning XoL
Table 1. The results of the comparison with PS-r* and SGA in Figure 18
For the POMDPs environments where there is Aberdeen, D., & Baxter, J. (2002). Scalable Inter-
no type 2 confusion and the number of types of nal-State Policy-Gradient Methods for POMDPs.
a reward is one, we have proved the Rationality In Proceedings of the Nineteenth International
Theorem of Profit Sharing (PS). Next, we have Conference on Machine Learning (pp. 3-10).
proved the Rationality Theorem of PS in multi-
Baird, L., & Poole, D. (1999). Gradient Descent
agent environments. Last, we have analyzed the
for General Reinforcement Learning. Advances
behavior of PS-r, that is an abstract algorithm
in Neural Information Processing Systems, 11,
of PS, in the POMDPs. Furthermore, we have
968–974.
proposed PS-r*, that is an extended algorithm of
PS-r, to fit on the POMDPs environments where Boutilier, C., & Poole, D. (1996). Computing
there is a type 2 confusion and the number of Optimal Policies for Partially Observable Deci-
types of a reward is one. sion Processes using Compact Representations. In
We have shown several numerical examples Proceedings of the Thirteenth National Conference
to support how to use these XoL methods. Also, on Artificial Intelligence (pp. 1168-1175).
we have shown that the performance of PS-r*
Chrisman, L. (1992). Reinforcement Learning
is not less than that of random walk (RA) and it
with Perceptual Aliasing: The Perceptual Dis-
exhibits exceptional potential to improve RA using
tinctions Approach. In Proceedings of the Tenth
a lower memory than the previous memory-based
National Conference on Artificial Intelligence
approaches.
(pp. 183-188).
Our future projects include: improving RA
in PS-r*, extending XoL to multi-dimensional Gosavi, A. (2004). A Reinforcement Learn-
reward and penalty environments and discover- ing Algorithm Based on Policy Iteration for
ing efficient real-world applications, and so on. Average Reward: Empirical Results with
Yield Management and Convergence Analysis.
Machine Learning, 55, 5–29. doi:10.1023/
REFERENCES B:MACH.0000019802.64038.6c
Abbeel, P., & Ng, A. Y. (2005). Exploration and Grefenstette, J. J. (1988). Credit Assignment
apprenticeship learning in reinforcement learning. in Rule Discovery Systems Based on Genetic
In Proceedings of the Twentyfirst International Algorithms. Machine Learning, 3, 225–245.
Conference on Machine Learning (pp. 1-8). doi:10.1007/BF00113898
283
Exploitation-oriented Learning XoL
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1994). Miyazaki, K., & Kobayashi, S. (2001). Rational-
Reinforcement Learning Algorithm for Partially ity of Reward Sharing in Multi-agent Reinforce-
Observable Markov Decision Problems. Advances ment Learning. New Generation Computing, 91,
in Neural Information Processing Systems, 7, 157–172. doi:10.1007/BF03037252
345–352.
Miyazaki, K., & Kobayashi, S. (2003). An Ex-
Kimura, H., Yamamura, M., & Kobayashi, S. tension of Profit Sharing to Partially Observable
(1995). Reinforcement Learning by Stochastic Markov Decision Processes: Proposition of PS-r*
Hill Climbing on Discounted Reward. In Proceed- and its Evaluation. [in Japanese]. Journal of the
ings of the Twelfth International Conference on Japanese Society for Artificial Intelligence, 18(5),
Machine Learning (pp. 295-303). 286–296. doi:10.1527/tjsai.18.286
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor- Miyazaki, K., Yamaumra, M., & Kobayashi, S.
Critic Algorithms. Advances in Neural Informa- (1994). On the Rationality of Profit Sharing in
tion Processing Systems, 12, 1008–1014. Reinforcement Learning. In Proceedings of the
Third International Conference on Fuzzy Logic,
Liepins, G. E., Hilliard, M. R., Palmer, M., &
Neural Nets and Soft Computing (pp. 285-288).
Rangarajan, G. (1989). Alternatives for Classi-
fier System Credit Assignment. In Proceedings Ng, A. Y., Harada, D., & Russell, S. J. (1999).
of the Eleventh International Joint Conference on Policy Invariance Under Reward Transforma-
Artificial Intelligent (pp. 756-761). tions: Theory and Application to Reward Shap-
ing. In Proceedings of the Sixteenth International
McCallum, R. A. (1995). Instance-Based Utile
Conference on Machine Learning (pp. 278-287).
Distinctions for Reinforcement Learning with
Hidden State. In Proceedings of the Twelfth Ng, A. Y., & Russell, S. J. (2000). Algorithms for
International Conference on Machine Learning Inverse Reinforcement Learning. In Proceedings
(pp. 387-395). of the Seventeenth International Conference on
Machine Learning (pp. 663-670).
Merrick, K., & Maher, M. L. (2007). Motivated
Reinforcement Learning for Adaptive Characters Perkins, T. J. (2002). Reinforcement Learning for
in Open-Ended Simulation Games. In Proceedings POMDPs based on Action Values and Stochastic
of the International Conference on Advanced in Optimization. In Proceedings of the Eighteenth
Computer Entertainment Technology (pp. 127- National Conference on Artificial Intelligence
134). (pp. 199-204).
Miyazaki, K., & Kobayashi, S. (1998). Learning Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994).
Deterministic Policies in Partially Observable Learning Without State-Estimation in Partially
Markov Decision Processes. In Proceedings of Observable Markovian Decision Processes. In
the Fifth International Conference on Intelligent Proceedings of the Eleventh International Confer-
Autonomous System (pp. 250-257). ence on Machine Learning (pp. 284-292).
Miyazaki, K., & Kobayashi, S. (2000). Reinforce- Sutton, R. S. (1988). Learning to Predict by the
ment Learning for Penalty Avoiding Policy Mak- Methods of Temporal Differences. Machine
ing. In Proceedings of the 2000 IEEE International Learning, 3, 9–44. doi:10.1007/BF00115009
Conference on Systems, Man and Cybernetics
(pp. 206-211).
284
Exploitation-oriented Learning XoL
285
Exploitation-oriented Learning XoL
We derive the necessary and sufficient condition to suppress irrational rules in the POMDPs where there
is no type 2 confusion and the number of types of a reward is one. In the first, we consider the local
rationality that can suppress any irrational rule (Theorem A.1). It is derived by two lemmas (Lemma
A.1 and A.2). We characterize a conflict structure where it is the most difficult to suppress irrational
rules (Lemma A.1). For two conflict structures A and B, we say A is more difficult than B when the
class of reinforcement functions that can suppress any irrational rule of A is included in that of B. Then,
we derive the necessary and sufficient condition to suppress any irrational rule for the most difficult
conflict structure (Lemma A.2).
The most difficult conflict structure has only one irrational rule with a self-loop.
Figure 19 show the most difficult conflict structure where only one irrational rule with a self-loop
conflicts with L rational rules.
Although we set L=1, we can easily extend it to any number. Reinforcement of an irrational rule makes it
difficult to learn a rational policy under any reinforcement function. Therefore, the difficulty of a conflict
structure varies monotonically with the number of reinforcements for irrational rules. We enumerate
conflict structures according to the branching factor b, that is the number of state transitions in the same
sensory input, the conflict factorc, that is the number of conflicting rules in it, and examine the number
of reinforcements for irrational rules.
b=1: It is clearly not difficult since there are no conflicts (Figure 20a).
b=2: When there are no conflicts (Figure 20b), it is the same as b=1. We divide the structures of c=2
into two subclasses. One contains a self-loop (Figure 20c), and the other does not it (Figure 20d). In the
case given in Figure 20c, there is a possibility that the self-loop rule is repeatedly selected, while the
non-self-loop rule is selected at most once. Therefore, if the self-loop rule is irrational, it will be rein-
forced more than the irrational rule of Figure 20d.
b ³ 3: When there are no conflicts (Figure 20e), it is the same as b=1. Consider the structure of c=2
(Figure 20f). Although the most difficult case is that the conflict structure has an irrational rule as a
self-loop, even such a structure is less difficult than Figure 20c. Considering the structure of c=3 (Figure
20g), two of the conflict rules are irrational. Therefore, the expected number of reinforcement for one
irrational rule is less than that of Figure 20f.
Similarly, conflict structures of b>3 are less difficult than Figure 20c.
From the above discussion, it is concluded that the most difficult conflict structure is expressed in
Figure 20c. Q.E.D.
Only one irrational rule with a self-loop can be suppressed if and only if suppression conditions hold.
286
Exploitation-oriented Learning XoL
Although we set L=1, we can easily extend it to any number. If the rational rule of the most difficult
conflict structure is reinforced by fN, the value of the reinforcement for the conflicting irrational rule
becomes maximal when it has been selected W-N times before the selection of the rational rule. Subse-
quently, the weight of the irrational rule is increased by fN+1 + ⋯ + fW, and that of the rational rule by fN.
From a viewpoint of rationality, the increased weight of the rational rule must be larger than that of the
irrational rule. Such a condition must hold for any later part of the reinforcement interval. Therefore,
suppression conditions are necessary. The sufficiency is evident. Q.E.D.
Using the law of transitivity, the following theorem is directly derived from these lemmas.
Any irrational rule can be suppressed if and only if suppression conditions hold.
Theorem A.1 guarantees that the local rationality in reinforcement learning will suppress any ir-
rational rule. A policy that is constructed by rational rules satisfies local rationality. However, we can
construct an example such that the policy is irrational. Next, we discuss the global rationality that can
learn a rational policy.
In the global rationality, the necessity of suppression conditions is evident. We investigate the suf-
ficiency (Lemma A.3).
287
Exploitation-oriented Learning XoL
If PS satisfies suppression conditions, it can learn a rational policy in the POMDPs where there is no
type 2 confusion and the number of types of a reward is one.
Although we set L=2, we can easily extend it to any number. If a policy is irrational, the policy has to
contain a rewardless loop that does not include any rewards. We need at least two episodes to construct
a rewardless loop. Furthermore, the following inequalities must be maintained in sensory inputs x and
y, that are the exits in the loop of Figure 21,
where xo and yo are rules that exit the loop, xi and yi are rules that enter the loop, and ∆ represents the
total reinforcement to be accumulated in a rule.
If the episode with xi contains xo and the irrational rule suppression theorem is satisfied, from theo-
rem A.1, wxo >wxi . Therefore, the episode with xi requires the rule that is not xo to exit the loop.
This also applied to yi. The following inequalities are derived using theorem A.1.
There is no solution to satisfy the inequality as ∆w > 0 . Then, we cannot construct a rewardless
loop. Therefore, if we use reinforcement functions that satisfy suppression conditions, we can always
obtain a rational policy. Q.E.D.
Theorem 1 is directly derived from this lemma. Q.E.D.
288
Exploitation-oriented Learning XoL
First, we derive the necessary and sufficient condition to suppress any irrational rule for the most dif-
ficult conflict structure. If it can be derived, we can extend it to any conflict structure by using the law
of transitivity.
From lemma A.1, the most difficult conflict structure has only one irrational rule with a self-loop. In
this structure, we can derive the following lemma.
Only one irrational rule with a self-loop in some goal-agent set can be suppressed if and only if Equa-
tion (3)
For any reinforcement interval k (k=0,1,…,W-1) in some goal-agent set, we show that there is j (j=1,2,…
,L) satisfying the following condition,
where wijk and wiok are weights of jth rational rule (rijk ) in agent i (i=0,1,…,n-1) and only one irrational
rule with a self-loop (rik0 ) in the agent, respectively (Figure 22).
First, we consider the ratio of the selection number of (rijk ) to (rik0 ) . When n’=1 (the number of the
goal-agent set is one) and L rational rules for each agent are selected by all agents in turn, the minimum
of the ratio is maximized (Figure 23). In this case, the following ratio holds (Figure 24),
Second, we consider weights given to (rijk ) and (rik0 ) . When the agent that obtains the direct reward
R
senses no similar sensory input in W, the weight given to (rijk ) is minimized. It is in k=W. On
M W -1
the other hand, when agents that obtain the indirect reward sense the same sensory input in W, the weight
W0
M 1
given to ri 0 is maximized. It is mR 1 − in W≥W0.
k
M − 1 M
289
Exploitation-oriented Learning XoL
Figure 24. Sample rule sequences at n=3 and L=3. Though the sequence 1 has some partial selection
of rules, the sequence 2 does not have it. The sequence 2 corresponds to Figure 23. Sequence 1 is more
easily learned than sequence 2 as discussed on Figure 23.
Therefore, it is necessary for satisfying condition Equation (10) to hold the following condition,
W0
R M 1
> mR 1 − (n − 1)L , (12)
M W −1 M − 1 M
that is,
M −1
m< . (13)
W0
W 1
M 1 − (n − 1)L
M
290
Exploitation-oriented Learning XoL
First, we show the most difficult environmental structure, where the average number of steps required
to obtain a reward by Policy(PS-r) is the worst in comparison with RA (Lemma C.1). Next, we analyze
its behavior in the structure (Lemma C.2). If it can be derived, we can extend it to all the classes of the
POMDPs where the number of types of a reward is one.
The most difficult environmental structure is an environment that is shown in Figure 25 where there
exist M−1 actions that are the same as action-b. We term it as structure W .
If all rules are rational, there is no difference between PS-r and RA. Therefore, we treat the case as one
where there is an irrational rule.
(i) M=2
In structure W , the rules that are constructed by action-a and b are regarded as irrational and rational,
respectively. If we select action-a in the state that is not state B, we can approach a reward. On the
other hand, if we select action-b in its states, we move to state A, that is the furthest from a reward.
Therefore, in structure W at M=2, if we select a rational rule, the number of steps required to obtain a
reward will be larger. In a structure that has a lesser effect than the case of structure W , the difference
between PS-r and RA reduces. Therefore, in the case of M=2, structure W requires the largest number
of steps to obtain a reward by PS-r in comparison with that of RA.
(ii) M>2
291
Exploitation-oriented Learning XoL
At first, we consider the case where the other irrational rules are added to structure W at M=2. If the
selection probabilities of their rules are not zero, the average number of steps required to obtain a reward
1
should be larger than that of structure W at M=2. In RA, the selection probability of the rule is . On
M
1
the other hand, in PS-r, it is the less than . Therefore, if we compare the same structure, when the
M
other irrational rules are added, the difference between RA and PS-r reduces.
Next, we consider the case where the other rational rules are added to structure W at M=2. In this
case, when all the other rules are the same as the rule that is constructed by action-b, that is the largest
average number of steps required to obtain a reward by PS-r in comparison with RA.
Therefore, structure W requires the largest number of steps to obtain a reward by PS-r in comparison
with RA. Q.E.D.
In structure W , the maximum value of the average number of steps required to obtain a reward by
Policy(PS-r) divided by that of RA is given by Equation (5)
1
In structure W , the average number of steps required to obtain a reward (Va) is Va = where
s(1 − s )n −1
s is the selection probability of action-b where there exist the same M-1 actions. We can calculate
M −1 M −1
s= and for RA and PS-r, respectively. Therefore, calculating Va] for each s, we
M (M − 1) + r
n
1 + M − 1
r
can get the rate r . Q.E.D.
Mn
Theorem 3 is directly derived from these lemmas. Q.E.D.
292
Section 7
Miscellaneous
294
Chapter 16
Pheromone-Style
Communication for
Swarm Intelligence
Hidenori Kawamura
Hokkaido University, Japan
Keiji Suzuki
Hokkaido University, Japan
ABSTRACT
Pheromones are the important chemical substances for social insects to realize cooperative collective
behavior. The most famous example of pheromone-based behavior is foraging. Real ants use pheromone
trail to inform each other where food source exists and they effectively reach and forage the food. This
sophisticated but simple communication method is useful to design artificial multiagent systems. In this
chapter, the evolutionary pheromone communication is proposed on a competitive ant environment model,
and we show two patterns of pheromone communication emerged through co-evolutionary process by
genetic algorithm. In addition, such communication patterns are investigated with Shannon’s entropy.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Pheromone-style Communication for Swarm Intelligence
One good way of introducing such relation- Nasanov pheromones for gathering their mates,
ships is provided by nature. Real ants and bees and queen pheromones as a signal to indicate the
are called social insects. Their colonies consist queen is alive.
of many members who attend to various jobs to A good example enabling the relationship
preserve the life of each colony (Sheely, 1995). The between pheromone communication and swarm
sizes of colonies are much too large for members, intelligence to be understood is the foraging
even queens, to comprehend all activities and behavior of real ants. In the first stage of typi-
information. In other words, although individual cal ant-foraging behavior, scouting worker ants
abilities to assess the condition of each colony are individually begin searching for routes from
limited, they can still do their work based only the nest in random directions. When a scouting
on this limited information. The total activities of worker discovers a food source along the route,
colonies, e.g., defense against enemies, repairing it picks up the food and brings it back to the nest
nests, childcare, and foraging, emerge due to the while laying down a pheromone. Consequently,
aggregation of such individual behaviors. More- it releases the first pheromone trail on the ground
over, colonies must balance out the total work from the food source to the nest. The pheromone
appropriately according to changes in the situa- trail plays an important role in collective foraging
tion to optimize their operating costs. It is easy behavior. If other workers around the nest find
to see that communication between members is the pheromone trail, they try to follow it to arrive
very important to attend to the many matters that at the food source. These workers also discover
each colony requires. the food source and then return to the nest while
Many species of social insects not only have reinforcing the intensity of the pheromone trail.
direct- but also indirect-communication channels The intensity of the pheromone trail is succes-
that are equally important attained by using special sively reinforced by the large numbers of workers
chemical substances, which are called “phero- who continue to march to the food source until
mones.” (Agosta, 1992) A pheromone is a chemi- all the food is consumed. No ants reinforce the
cal that triggers a natural behavioral response in pheromone trail after they have removed all the
another member of the same species. When one food. The pheromone gradually evaporates from
member of an ant colony senses a particular in- the ground and the trail automatically dissipates
ternal condition or external stimulus, it responds into the air.
to such a situation and it releases a corresponding This type of sophisticated collective behavior
kind of pheromone into the environment. The can emerge due to the complex effect of local
pheromone is diffused through the environment decision-making, pheromone-communication
by natural characteristics, e.g., evaporation from channels, and natural characteristics of the envi-
the ground, diffusion in the air, and physical ronment. The mechanism based on such complex
contact between the members or the members effects is called “stigmergy” in the research area
and their enemies. By using the effect of such of ethology. In the pheromone mechanism of
translation from the sender to the environment, stigmergy, a member releases a pheromone into
the pheromone signal sends not only a message the environment, which interferences with its
from the sender but also information about the propagation, and the other members detect the
current environment. When the receiver senses pheromone from the environment. This com-
such a pheromone, it causes a particular reaction munication channel enables the entire colony to
in the receiver due to the natural characteristics of organize all of its members and achieve high-level
the species. One kind of honeybee handles over tasks that require coordination and decentraliza-
thirty types of pheromones, which include alarm tion between them (Dorigo & Theraulaz, 1999).
pheromones for warnings about enemy attacks,
295
Pheromone-style Communication for Swarm Intelligence
To investigate what effect stigmergy has had Fernandez & Marin, 2000). Ando et al. succeeded
with artificial pheromones, some researchers have in predicting the future density of traffic conges-
tried to create models of swarming behavior. Col- tion by using an artificial pheromone model on a
lins et al. and Bennett studied the evolutionary road map (Ando, Masutani, Honiden, Fukazawa
design of foraging behavior with neural networks & Iwasaki, 2006). These researchers revealed that
and genetic algorithms (Collins & Jeffersion, artificial stigmergy with pheromone models are
1991; Bennett III, 1996). Nakamura et al. reported useful for applications to multi-agent systems.
the relationship between global effects and local This chapter focuses on the design of multi-
behavior in a foraging model with artificial ants agent systems to successfully complete given tasks
(Nakamura & Kurumatani, 1997). Suzuki et al. with collective behavior and artificial stigmergy.
demonstrated the possibility of pheromones solv- We particularly selected the evolutionary design
ing a deadlocked situation with food-scrambling in pheromone communication. If an agent has
agents (Suzuki & Ohuchi, 1997). These research- a specific advance rule to output a pheromone,
ers revealed the possibility that artificial stigmergy another agent should optimize its reaction to the
based on pheromone models could be used to pheromone to establish communication. How-
replicate swarm intelligence. ever, if an agent has a specific advance reaction
Some good applications of artificial stigmergy to a pheromone, another agent should optimize
to help solve these problems have been proposed. the situation as to when and where it outputs
One of the most successful examples is the ant the pheromone. The possibility of designing
colony optimization (ACO) algorithm proposed evolutionary-pheromone communication is not
by Dorigo et al. (Dorigo, Maniezzo & Colorni, so easy in these cases because it is necessary for
1991; Colorni, Dorigo & Maniezzo, 1991) ACO the communication to simultaneously specify
is a probabilistic technique for solving compu- the reaction to the pheromone and the situation
tational problems that can be reduced to finding to output it.
good paths through the use of graphs (Dorigo & Section 2 proposes an ant war as a competitive
Stutzle, 2004). In ACO, artificial pheromones environment for a task requiring effective agent
are introduced to each path to represent how a communication. Section 3 explains the evolution-
corresponding path is useful for constructing a ary design process for the agent system. Section
solution to a given computational problem. The 4 presents the setting for computer simulation,
series of ACOs perform well in solving various and Section 5 concludes the chapter by clarify-
computational problems. In other research, Sauter ing the evolutionary process to form artificial
et al. proposed the use of digital pheromones for pheromone communication from the viewpoint
controlling and coordinating swarms of unmanned of information theory.
vehicles, and demonstrated the effectiveness of
these pheromone algorithms for surveillance,
target acquisition, and tracking (Sauter, Mat- ANT WAR AS COMPETITIVE
thews, Parunak & Brueckner, 2005). Mamei et al. ENVIRONMENT
proposed a simple low-cost and general-purpose
implementation of a pheromone-based interaction An ant war is a competitive environment for two
mechanism for pervasive environments with RFID teams of ant-like agents (Kawamura, Yamamoto,
tags (Mamei & Zambonelli, 2007). Sole et al. dis- Suzuki & Ohuchi, 1999; Kawamura, Yamamoto
cussed that behavioral rules at the individual level & Ohuchi, 2001). The environment is constructed
with a pheromone model could produce optimal on 44X80 grids, and these two are for a blue-ant
colony-level patterns (Sole, Bonabeau, Delgado, and a red-ant team. Each team consists of 80
296
Pheromone-style Communication for Swarm Intelligence
Figure 1. The outline of ant war competition. This figure has been reproduced by permission from
Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone Com-
munication Based on External Measurement and Emergence of Swarm Intelligence, Japanese Journal
of the Society of Instrument and Control Engineers, 37(5), 455–464.
297
Pheromone-style Communication for Swarm Intelligence
Figure 2. The firing probability of pheromone sensory inputs. This figure has been reproduced by
permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
• Whether the agent contacts a mate to its where T is the parameter of sensitivity. The firing
right or not. probability takes a higher value in responding to
• Whether the agent contacts a mate to its relatively high-density pheromones (see Figure 2).
left or not. Although the pheromone sensor only has binary
• Whether the agent contacts a mate behind values, agent k can react according to the density
it or not. of pheromones. Since the agent in this model
must determine its actions according to sensory
An additional four bits,i7i8i9i10, stochastically information only about the neighborhood, effec-
respond to the aerial densities of the pheromone tive communication through pheromone channels
on four neighboring grids. Here, let (x, y) be is important to win in this competition.
the position of agent k and At time t, each individual agent can select one
(x − 1, y ), (x + 1, y ), of seven actions, going forward, backward, left,
(x ′, y ′) ∈ be the right, standing by in the current position, pulling
(x, y − 1),(x, y + 1)
a food pack, or laying a pheromone in the current
sensing position of agent k. The firing probabil- position. If the agent wants to pull a food pack
ity of each pheromone bit is defined as and actually apply force to the target food, it has
to satisfy a condition where it touches the target
Pfire (x ′, y ′) = food pack or a mate who is actually pulling the
1 target food pack. The food pack is moved in a tug
P (x ′, y ′, t) − P (x, y, t) of war. More agents can move the food pack more
1 + exp − quickly. Then, the food pack crossing the goal
ave
T
line is removed from the environment and the
team is awarded a score.
The competition is finished when time t expires
P (x − 1, y, t) +
or all the food packs have crossed the goal line.
P x + 1, y, t + The winner is the team who has collected more
Pave (x, y, t) =
( ) / 4, food packs than its opponent.
P (x, y − 1, t) +
P(x, y + 1, t)
298
Pheromone-style Communication for Swarm Intelligence
−1
11 • Step 1: Two chromosomes are randomly
o k = 1 + exp −∑w jk i j
j=1
/ selected from the population.
• Step 2: Ant-agent teams with the selected
−1
7 11 chromosomes compete in the ant war. The
∑ 1 + exp −∑w jh i j , loser is removed from the population.
h =1 j=1
• Step 3: The chromosomes of the winner
are copied to two prototypes and the chro-
mosomes of these two are varied by cross-
where i11 is the additional input and -1 is always over and mutation operations.
set as the bias for the neural network. • Step 4: Two new chromosomes are re-
turned to the population.
• Step 5: Go back to Step 1 until the final
EVOLUTIONARY PROCESS iteration.
The main topic of this chapter is to explain how Crossover operation exchanges the weight
multi-agent systems organize artificial pheromone values at the same locus of two chromosomes
communication with evolutionary computations. with the probability, Pc. Mutation operation adds
The chromosomes in the computations are con- noise from [-0.5, 0.5] to each weight with the
structed with a set of weights, wjk. At the initial probability, Pm.
stage of evolution, all wjk are initialized with a
random value from [-0.5, 0.5]. The evolutionary
computation has N chromosomes and these chro- ExPERIMENT
mosomes evolve through a five-step procedure
(see Figure 3). A computer experiment was carried out with 10
trials. The maximum simulation time in each
299
Pheromone-style Communication for Swarm Intelligence
Figure 4. An example distribution of obtained pheromones. This figure has been reproduced by permis-
sion from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone
Communication Based on External Measurement and Emergence of Swarm Intelligence, Japanese
Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
ant-war environment was 2000. The parameters pack based on the pheromone’s level of intensity.
of the artificial pheromones, Q, γeva, and γdif were This type of pheromone attracts their mates and
set to correspond to 50, 0.1, and 0.1. There were we called it an “attractive pheromone.” There is
20 chromosomes, and the final generation of the an example distribution of attractive pheromones
evolutionary process was 50000. The parameters in Figure 4 (A).
for evolutionary operations, Pc and Pm, were set Repulsive pheromone: An ant using this type
to correspond to 0.04 and 0.08. These settings of pheromone has a tendency to dislike it. When
were determined through various preliminary exploring, this ant scatters the pheromone on the
experiments. ground. Once the ant finds a food pack, it stops
to release the pheromone. As a result of such
behavior, the pheromone field leaves the food
RESULTS pack and enters the environment. This means that
ants mark space with the pheromone that has
Two types of pheromone communications were already been explored, which is unnecessary to
self-organized in the final generation through 10 re-explore, and they therefore save time in ef-
trials. We called these two attractive and repulsive fectively finding the food pack. As this type of
pheromones. There were seven attractive phero- pheromone repulses their mates, we called it a
mones, and three repulsive pheromones. “repulsive pheromone.” There is an example
Attractive pheromone: An ant based on this distribution of repulsive pheromones in Figure 4
type of pheromone basically walks randomly and (B).
goes about exploring the environment when not Although an artificial evolutionary mechanism
sensing a special stimulus. Once sensing contact where winners survive and losers disappear gen-
with a food pack, the ant tries to carry it and releases erated these two types of ant behaviors, it is not
the pheromone on the ground. The pheromone clear whether these are sufficiently dominant in the
released by such an ant diffuses near the food evolutionary process. To evaluate the evolution of
pack, and its intensity in the environment is based ant strategies, we depicted the winning percentage
on the food pack. The other ants who have sensed of 100 competitions between successive genera-
the pheromone try to effectively reach the food tions versus the final generation. The evolutionary
300
Pheromone-style Communication for Swarm Intelligence
Figure 5. The evolutionary transition of the winning percentage in the two types. The opponent of the
competition is the final generations. This figure has been reproduced by permission from Kawamura, H.,
Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone Communication Based
on External Measurement and Emergence of Swarm Intelligence, Japanese Journal of the Society of
Instrument and Control Engineers, 37(5), 455–464.
transition of the winning percentage in the two We next measured the evolutionary process of
types is shown in Figure 5. The X-axis indicates communication by quantifying the effectiveness
the number of generations, and the Y-axis indicates of pheromone sensor inputs and the uniqueness of
the winning percentage. The percentages of both the situation where the ant released the pheromone.
types increase as each generation progresses. This Shannon’s entropy and mutual information from
graph plots the evolutionary process for dominant information theory were selected to measure the
behavior to emerge. degree of communication (Shannon & Weaver,
The tendencies of both lines differ in terms of 1964). Formally, the entropy of a discrete vari-
evolution speed, i.e., the attractive pheromone able, X, is defined as:
evolved quicker than the repulsive. The difference
in evolutionary speed may have been caused by H (X ) = −∑p (x ) log p(x),
the strength of the relationship between the out- x ∈X
301
Pheromone-style Communication for Swarm Intelligence
Figure 6.The evolutionary transition of entropy and mutual information. This figure has been reproduced
by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
302
Pheromone-style Communication for Swarm Intelligence
Figure 7. The evolutionary transition of entropy and mutual information. This figure has been reproduced
by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
303
Pheromone-style Communication for Swarm Intelligence
Figure 8. The evolutionary transition of entropy and mutual information. This figure has been reproduced
by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
Figure 9. The evolutionary transition of entropy and mutual information. This figure has been reproduced
by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
biased in specific patterns. After evolution, the and 11. Here again, I (O,P
) corresponds to the
ants scattered here and there to effectively search effectiveness of pheromone sensory inputs on
for a food pack and the variations in sensory inputs
decision-making, and I (I,A
) corresponds to the
were wider than those in the early generations.
To investigate the evolutionary process of uniqueness of sensory-input patterns for selecting
) , we plotted scatter diagrams
) and I (I,A pheromone-release actions. The graphs indicate
I (O,P
) have a
) and I (I,A
that both values of I (O,P
with pairs observed with these values. The scatter
diagrams for attractive-pheromone evolution and distinct positive correlation and the values of the
repulsive-pheromone evolution are in Figures 10 pair are increasing together step by step. This
suggests that the situation’s uniqueness in sending
304
Pheromone-style Communication for Swarm Intelligence
Figure 10. The scatter diagrams of mutual information for attractive-pheromone evolution. This figure has
been reproduced by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation
of Evolutionary Pheromone Communication Based on External Measurement and Emergence of Swarm
Intelligence, Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
Figure 11. The scatter diagrams of mutual information for repulsive-pheromone evolution. This figure has
been reproduced by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation
of Evolutionary Pheromone Communication Based on External Measurement and Emergence of Swarm
Intelligence, Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
305
Pheromone-style Communication for Swarm Intelligence
computer simulations, i.e., ants with attractive and Dorigo, M., Maniezzo, V., & Colorni, A. (1991).
repulsive pheromones. Both types demonstrated Positive Feedback as a Search Strategy (Techni-
rational strategies to win the competition and these cal Report No. 91-016). Politecnico di Milano.
strategies effectively utilized the characteristics
Dorigo, M., & Stutzle, T. (2004). Ant Colony
of artificial pheromones and the environment.
Optimization. Cambridge, MA: The MIT Press.
We introduced Shannon’s entropy and mutual
information on artificial pheromones to measure Kawamura, H., & Yamamoto, M. Suzuki &
the situation’s uniqueness in sending pheromones Ohuchi, A. (1999). Ants War with Evolutive
and the reaction’s uniqueness in receiving phero- Pheromone Style Communication. In Advances
mones. Such uniqueness represented two sides in Artificial Life, ECAL’99 (LNAI 1674, pp.
of the same coin and artificial-pheromone com- 639-643).
munication was gradually formed while the same
Kawamura, H., Yamamoto, M., & Ohuchi, A.
pace was maintained.
(2001). (in Japanese). Investigation of Evolu-
tionary Pheromone Communication Based on
External Measurement and Emergence of Swarm
REFERENCES
Intelligence. Japanese Journal of the Society of In-
Agosta, W. (1992). Chemical Communication strument and Control Engineers, 37(5), 455–464.
– The Language of Pheromone. W. H. Freeman Mamei, M. & Zambonelli, F. (2007). Pervasive
and Company. pheromone-based interaction with RFID tags.
Ando, Y., Masutani, O., Honiden, S., Fukazawa, ACM Transactions on Autonomous and Adaptive
Y., & Iwasaki, H. (2006). Performance of Phero- Systems (TAAS) archive, 2(2).
mone Model for Predicting Traffic Congestion. Nakamura, M., & Kurumatani, K. (1997). For-
In . Proceedings of AAMAS, 2006, 73–80. mation Mechanism of Pheromone Pattern and
Bennett, F., III. (1996). Emergence of a Multi- Control of Foraging Behavior in an Ant Colony
Agent Architecture and New Tactics for the Ant Model. In Proceedings of the Fifth International
Colony Food Foraging Problem Using Genetic Workshop on the Synthesis and Simulation of
Programming. In From Animals to Animats 4, Living Systems (pp. 67 -74).
Proceedings of the Fourth International Confer- Sauter, J., Matthews, R., Parunak, H., & Brueckner,
ence on Simulations of Adaptive Behavior (pp. S. (2005). Performance of digital pheromones for
430–439). swarming vehicle control. In Proceedings of the
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). fourth international joint conference on Autono-
Swarm Intelligence from Natural to Artificial mous agents and multiagent systems (pp. 903-910).
Systems. Oxford University Press. Shannon, C., & Weaver, W. (1964). The Mathe-
Collins, R., & Jeffersion, D. (1991). AntFarm: matical Theory of Communication. The University
Towards Simulated Evolution. In Artificial Life II, of Illinois Press.
Proceedings of the Second International Confer- Sheely, T. (1995). The Wisdom of the Hive: The
ence on Artificial Life (pp. 159–168). Social Physiology of Honey Bee Colonies. Harvard
Colorni, A., Dorigo, M., & Maniezzo, V. (1991). University Press.
Distributed Optimization by Ant Colonies. In .
Proceedings, ECAL91, 134–142.
306
Pheromone-style Communication for Swarm Intelligence
Sole, R., Bonabeau, E., Delgado, J., Fernan- Suzuki, K., & Ohuchi, A. (1997). Reorganization
dez, P., & Marin, J. (2000). Pattern Forma- of Agents with Pheromone Style Communica-
tion and Optimization in Army Raids. [The tion in Mulltiple Monkey Banana Problem. In .
MIT Press.]. Artificial Life, 6(3), 219–226. Proceedings of Intelligent Autonomous Systems,
doi:10.1162/106454600568843 5, 615–622.
307
308
Chapter 17
Evolutionary Search for
Cellular Automata with Self-
Organizing Properties toward
Controlling Decentralized
Pervasive Systems
Yusuke Iwase
Nagoya University, Japan
Reiji Suzuki
Nagoya University, Japan
Takaya Arita
Nagoya University, Japan
ABSTRACT
Cellular Automata (CAs) have been investigated extensively as abstract models of the decentralized
systems composed of autonomous entities characterized by local interactions. However, it is poorly un-
derstood how CAs can interact with their external environment, which would be useful for implementing
decentralized pervasive systems that consist of billions of components (nodes, sensors, etc.) distributed in
our everyday environments. This chapter focuses on the emergent properties of CAs induced by external
perturbations toward controlling decentralized pervasive systems. We assumed a minimum task in which
a CA has to change its global state drastically after every occurrence of a perturbation period. In the
perturbation period, each cell state is modified by using an external rule with a small probability. By
conducting evolutionary searches for rules of CAs, we obtained interesting behaviors of CAs in which
their global state cyclically transited among different stable states in either ascending or descending
order. The self-organizing behaviors are due to the clusters of cell states that dynamically grow through
occurrences of perturbation periods. These results imply that we can dynamically control the global
behaviors of decentralized systems by states of randomly selected components only.
DOI: 10.4018/978-1-60566-898-7.ch017
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Evolutionary Search for Cellular Automata with Self-Organizing Properties
309
Evolutionary Search for Cellular Automata with Self-Organizing Properties
The self-organizing behaviors of CAs can be pattern of the states. However, these discussions
dynamically affected by the external influences. were based on several hand-coded rules for CAs.
Several models of CAs which have such a property Thus, it is still an open question how CAs can
have been proposed for controlling decentral- show emergent properties through the interactions
ized pervasive systems (Figure 2) (Mamei et with an external world.
al., 2005 and Kwak, Baryshnikov & Coffman, However, in general, there are complex rela-
2008). Decentralized pervasive systems consist tionships between the global behaviors of CAs
of distributed components (nodes, sensors, etc.) in and the local behaviors of cells. It is difficult to
our everyday environments and the systems per- design the rules for CAs by hand-coding which
form global tasks using their whole components. exhibit the desired emergent behaviors. Thus,
Mamei et al. (2005) focused on the influences there have been various studies based on evolu-
of the external world on decentralized pervasive tionary searches for rules of CAs that can ex-
systems, and constructed asynchronous CA with hibit emergent behaviors (Mitchell, Crutchfield
external perturbations termed ``dissipative cellular & Hraber, 1994). Rocha (2004) evolved CA rules
automata’’ (Roli & Zambonelli, 2002) in that the that can solve the density task, and discussed
external environment can somehow inject energy about the nature of the memory-like interactions
to dynamically influence the evolution of the au- among the particles of the cell states emerged for
tomata. They regarded the asynchronous CA as a storing and manipulating information. Ninagawa
group of autonomous individuals which locally in- (2005) also evolved CAs which generate 1/f noise
teract with each other, and introduced continuously where the power is inversely proportional to the
occurring perturbations on the states of the cells frequency, and obtained a CA of which behavior
into their model. The perturbations correspond is similar to the Game of Life. As above, an evo-
to the influences caused by the external world on lutionary search for rules of CAs will be also
the group. The CAs presented regular patterns if useful for understanding the characteristics of
and only if they are induced by the perturbations, systems that exhibit self-organizing properties
such as a stripe pattern. Moreover, they argued caused by interactions with an external environ-
about applications of such self-organized features ment.
for controlling decentralized pervasive systems This chapter focuses on the emergent properties
such as sensor networks. For example, based on of CAs induced by external perturbations toward
the experiments of dissipative CAs, they showed controlling decentralized pervasive systems. We
possible scenarios that the global state of the CA assumed a minimum task in which CAs have to
can be changed to the desired pattern only by change its global state after every occurrence of
filling up the limited area of the CA with a fixed perturbation period, and searched the rules for
Figure 2. CAs with external perturbations aimed at controlling decentralized pervasive systems
310
Evolutionary Search for Cellular Automata with Self-Organizing Properties
CAs which can solve the task by using a genetic ´ n(St ) (Pa )
algorithm (GA) (Iwase, Suzuki & Arita, 2007). q i,t +1j = t i, j (1)
q i, j n (1 - Pa ) ,
We obtained the rules for CAs in which global
state of evolved CAs cyclically transited among
different stable states of which the number is more where δ is a local transition rule which maps a
than that of distinct cell states, and looked into the configuration of cell states in a neighborhood (3×3
self-organizing properties that a drastic change cells around the focal cell (i, j)) Sti, j into a state.
in its global state occurs every two successive Also we introduce an external perturbation ε
occurrences of perturbation periods. which changes a cell state independently of δ. ε
expresses a simple transition rule that increments
the value of the cell state as defined by
TASK
( )
e : qit,+j 1 := qit,+j 1 + 1 mod M . (2)
We constructed a task that self-organizing behav-
iors induced by external perturbations are required
to solve, and we searched the rules of CAs for
solving the task by using GA. In an evaluation ε is applied to each cell with a probability Pe every
process of a CA, there are the fixed number of after transitions of cell states by Equation 1. We
perturbation periods in which each cell state is expect that some of relationship between the global
modified by using an external rule with a small behavior of CAs and external perturbations can
probability. The CA has to change its configuration occur by introducing Equation 2. It is because that
represented by the distribution ratio of cell states the actual effect of a perturbation on the CA (a
after every occurrence of a perturbation period. change in the cell state) is deterministic although
This is non-trivial if the number of the perturbation it occurs probabilistically.
periods is larger than the number of possible cell
states because the global behavior of CAs must Transition and Evaluation
stably exhibit different configurations composed
of intermediate distribution ratios. Thus the CAs Figure 3 is a time diagram of an evaluation process
should need some kind of emergent properties that for a rule (δ) of a CA described above. Starting
utilize the occurrences of perturbations effectively. from the initial condition in which each cell state
is randomly assigned, the transitions without
Cellular Automata and Perturbations perturbations (Pe = 0.0) occur for Lpre steps so
that effects of the initial condition are eliminated.
We adopt a two-dimensional (N × N) M-state Next, a perturbation period of Ld steps occurs
nine-neighbor (Moore neighborhood) cellular every after a normal period of approximately Lint
automata with periodic boundary condition as steps. For each cell, a perturbation occurs with
an abstract model of the distributed systems a probability Pe = β during perturbation periods,
composed of autonomous entities characterized and it does not occur during normal periods (Pe =
by local interactions. 0). The evaluation stops when the normal periods
A cell (i, j) have a state qti, j ∈ {0, …, M − 1} have occurred for D + 1 times. Note that the actual
at time step t. At each time step t, each cell state is time step at which each perturbation period starts
asynchronously updated with a probability Pa by fluctuates randomly within a specific range (±Lfluct
steps) as shown in Figure 3.
311
Evolutionary Search for Cellular Automata with Self-Organizing Properties
rt (s ) = ∑
i, j
otherwise,
N ×N (i , j )∈N ×N 0
Equation (5) defines the difference between
Át = { t
r (0), r (1), , rt (M − 1), .
t
} the scaled density distributions as the sum of the
(3) absolute differences between the corresponding
elements of the distributions. The fitness of δ is
Also, in order to stress the existence of a small the sum of the differences over all possible pairs
amount of each cell state, we define a scaled of the scaled density distributions at the last steps
density distribution ρtθ,φby of the normal periods.
Thus, the CAs have to use an occurrence of
1 a perturbation period as a trigger to change their
SFθ,ϕ (x ) = ,
1 +e −(x −θ )×ϕ own global configuration dynamically.
ρθt ,ϕ (s ) = SFθ,ϕ (ρt (s )),
Áθt ,ϕ = {ρ t
θ,ϕ }
(0), ρθt ,ϕ (1), , ρθt ,ϕ (M − 1), ,
EVOLUTIONARY SEARCh BY
(4)
GENETIC ALGORIThM
where SFθ,φ is a sigmoid function which scales ρt
Transition Rule
(s) with a threshold θ, and φis a parameter for a
degree of scaling. Equation 4 means that if ρt (s)
We optimize the rules for CAs to maximize the
is more than θ, ρtθ,φ becomes close to 1. Otherwise,
fitness defined above by using GA. We adopt a
it becomes close to 0.
transition rule based on the number of each cell
Here, we take ρtθ,φ as the global configuration
state in the neighborhood, expecting emergence
of the CA, and define the fitness by using ρtθ,φ at
of interesting behaviors of the CA and reduction
the last steps of normal periods (ρa(0)θ,φ, ρa(1)θ,φ, …,
of the search domain for GA.
ρa(D)θ,φ in Figure 3) as follows:
Figure 4 illustrates an example of the transition
rules in the case of M = 3. This rule is an extended
Áθi ,ϕ − Áθj,ϕ = ρθi ,ϕ (0) − ρθj,ϕ (0) + version of outer-totalistic rules. The pattern of the
ρθi ,ϕ (1) − ρθj,ϕ (1) + neighborhood configuration Sti, j at the cell (i, j)
is given by
312
Evolutionary Search for Cellular Automata with Self-Organizing Properties
313
Evolutionary Search for Cellular Automata with Self-Organizing Properties
4. Two offsprings are generated from the fitness, and the worst fitness was almost zero
pair based on a two-point crossover with a through the experiment.
probability Pcrossover and a mutation for each As defined in Equation (6), the fitness is the
gene with a probability Pmutation. A mutation sum of the differences between the scaled den-
changes gl to a random value (0 ≤ gl < M) sity distributions. So as to grasp the main reason
except for the current value. for the increase in fitness through the experiment,
5. The E elites and I – E offsprings form the we plotted the all differences measured during
population of the new generation, and the the evaluation processes of all individuals at each
process goes back to the step 2 until the generation in Figure 6. After approximately the
generation reaches G. 30th generation, we see that the difference often
became the maximum value 3.0 which was ob-
ExPERIMENTAL RESULTS tained when the difference between the mutually
AND ANALYSES opposite distributions were measured. It clearly
shows that the evolved CAs successfully exhib-
Course of Evolution ited the self-organizing behaviors that their
global configurations drastically changed after
We adopted the settings of parameters as follows: the occurrences of external perturbations.
N = 64, M = 3, Pa = 0.2, β = 0.01, D=5, Lpre =
2048, Lint = 1024, Lfluct = 128, Ld = 8, θ = 0.1, φ = Emergence of State Transition
100, I = 32, E = 8, G = 256, Pcrossover = 0.75 and Cycles Induced by External
Pmutation = 0.05. Perturbations
We conducted 12 runs, and it turned out that
the fitness reached approximately 27 in 10 runs. In the previous section, we observed that the
The fitness of remaining 2 runs went up to about population successfully evolved, and individuals
21 or 25. Here, we focus on the run in which the were expected to exhibit self-organizing behaviors
fitness reached the best value among them. induced by perturbations. Next, we analyze the
Figure 5 shows the best, average and worst behavior of the adaptive individuals in detail, and
fitness at each generation. The best fitness was discuss their self-organizing properties.
about zero in the initial population and it rapidly Figure 7 illustrates the transitions of several
went up to approximately 19 until the 10th genera- indices during an evaluation of a typical individual
tion, then eventually converged to approximately
27 around the 50th generation. Also, we see the
average fitness tended to be a half of the best Figure 6. The difference between the scaled den-
sity distributions. The dots represent all values of
Equation (5) measured when all individuals were
Figure 5. The fitness of the population evaluated in each generation.
314
Evolutionary Search for Cellular Automata with Self-Organizing Properties
in the last generation of the same run as in Figure 5. tions among cells showed that if the number of the
The string of genes is ``020110010011110120010 subsequent dominant cell state exceeds a certain
111010121012010000000000’’2. The lower graph threshold by perturbations, it begins to increase
shows the transitions of the elements of density during the subsequent normal period.
distribution and the entropy3 during the evaluation. Figure 8 is the trajectory of the density dis-
The above images also illustrate the configuration tribution during this evaluation. We see that the
of cell states at the end of each period. density distribution showed a triangle cycle on
Through the preparation period, the configura- this space as a result of the emergent dynamics
tion of the CA gradually changed from the random explained above. This cycle is ascending in that
initial configuration to the stable one which was the value of the dominant cell state increases as
characterized by the decrease in the entropy. At the global configuration changes.
the end of the preparation period, there were small The global configuration of the CA in Figure
clusters of the state 1 in a sea of the state 0 (Fig- 7 and Figure 8 showed cyclic transitions between
ure 7 - 1). The configuration did not change through 6 different configurations occupied by almost one
the first normal period (2). cell state (see Figure 7 - 4) or two cell states (i.
Because the most of the cell states were 0, e. a number of state 2 in a sea of state 1, (6)).
the occurrences of the first perturbation period Because the scaling function increases the small
increased the density of the state 1 (3). Then, the density of the cell state, the actual differences in
clusters of 1 gradually expanded their size and these scaled density distributions become large,
finally occupied the whole configuration through and as a result, the differences between the sev-
the subsequent normal period (4). eral pairs (i. e. (6) and (12)) become the highest.
In contrast, the effect of the second perturbation Also, in each normal period, the global configu-
period was not strong. Although the density of the ration completely converges before the end of the
state 2 was increased (5), the global configuration period as shown in Figure 7. Thus, we can say
did not change any further (6). However the effect that the cyclic behavior emerged through the
of the third perturbation period (7) caused the course of evolution because of these adaptive and
significant change in the global configuration (8). stable properties.
The clusters of the state 2 appeared, expanded their On the other hand, we also observed another
size, and finally occupied the whole configuration. interesting rule at the last generation in other
As explained, we observed similar changes in the runs. The string of genes is ``112211000100211
dominant cell state (12) every two occurrences of 001212112222120222200000000000’’. Its typi-
perturbation periods (9 - 11). The detailed analyses cal behavior is illustrated in Figure 9 and Figure
on the effects of perturbations on the local interac- 10. As we can see from this figure, the density
315
Evolutionary Search for Cellular Automata with Self-Organizing Properties
316
Evolutionary Search for Cellular Automata with Self-Organizing Properties
global configuration was decided by whether the set of existent cell states in the corresponding
density of the cell state can be increased by the class, and each value is the specific transition
scaling function Equation 4 or not4. For example, probability from the column to the row class. The
if a density distribution is {0.80, 0.15, 0.05} and transition diagram on the right also visualizes the
the scaled density distribution is {1.00, 0.99, same distribution probabilities, in which the line
0.01}, then it is regarded that the cell state 0 and types of arrows correspond to different ranges of
1 exist on the configuration. Then, we measured the value. As shown from the table and diagram,
the transition probabilities between the classes all the transition probabilities corresponding to
during the evaluation process. the ascending cycle {0} → {0, 1} → {1} → {1,
The table in Figure 11 displays the average tran- 2} → {2} → {2, 0} → …, were greater than 0.65.
sition probabilities between configurations over As above, the ascending cycle with 6 different
100 evaluations of the individual which showed configurations is stable through the long-term
the ascending cycle in the previous experiment. evaluation.
Each set of cell states in row and column is the
Figure 11. The transition table and diagram for global configuration (ascending cycle)
Figure 12. The transition table and diagram for global configuration (descending cycle)
317
Evolutionary Search for Cellular Automata with Self-Organizing Properties
The table in Figure 12 displays the transition configuration, expecting emergences of the self-
probability between configurations of the indi- organizing behaviors and the reduction of the
vidual which showed the descending cycle in the search space for a GA. We assumed a minimal
previous experiments. The transition probabilities task in which a CA has to change its global state
corresponding to the descending cycle {0} → {2, every perturbation, and then searched the rules for
0} → {2} → {1, 2} → {1} → {0, 1} → …, were CAs which can solve the task by using a GA. We
greater than 0.65 approximately, which are simi- obtained the rules for the CA in which the global
lar to the transition probabilities of the ascending configuration cyclically transited among different
cycle. The transition diagram clearly shows that stable configurations, and these stable configura-
the cycle is also stable while it is reversed com- tions composed of not only homogeneous but also
pared to the previous one. heterogeneous cell states. As a result, the number
of stable configurations became twice as that of
possible cell states. These interesting results were
CONCLUSION obtained only when we introduced the transitivity
(Equation (8)) into the rule of CAs. It should be
We have investigated emergent properties of CAs emphasized that we found both ascending and
induced by external perturbations. We introduced descending cycles of global configurations even
the transitivity into an extended version of outer- though a perturbation always increments the value
totalistic rules of CAs, and adopted the scaled of a cell state. Detailed analyses showed that the
density distribution of cell states as the global ascending cycle was due to the self-organizing
Figure 13. The emergent behaviors of the CAs in which the cells performed random walk on a two dimen-
sional space. The neighborhood of a cell is defined by all those cells that are within a specific distance.
Each cell can be regarded as a unit of decentralized mobile robots. We adopted the same rule as that of
the CA which exhibited an ascending cycle in Figure 7. The center graph shows the transitions of the
elements of density distribution during the evaluation. Each image also illustrates the configuration
of cell states at the end of each period. Each circle denotes a cell, and its color is assigned to its state.
The occurrence of the second perturbation period increased the density of the state 0 (2, 3). Then, the
clusters of 0 gradually expanded their size and finally occupied the whole configuration through the
subsequent normal period (4).
318
Evolutionary Search for Cellular Automata with Self-Organizing Properties
319
Evolutionary Search for Cellular Automata with Self-Organizing Properties
ference on Cellular Automata for Research and where Hi, j is the entropy of the cell (i, j), and
Industry (ACRI2002) (pp. 144-155). Pi,j (s) is the probability of the occurrence of
the cell state s at (i, j) during each period.
Sakai, S., Nishinari, K., & Iida, S. (2006). A 4
Actually, we defined that the density can
New Stochastic Cellular Automaton Model on be increased if the density is larger than the
Traffic Flow and Its Jamming Phase Transition. x-value (approximately 0.075) at the inter-
Journal of Physics. A, Mathematical and Gen- section of y = SF0.1, 100 (x) with y = x around
eral, 39(50),15327–15339. doi:10.1088/0305- θ = 0.1.
4470/39/50/002
Wolfram, S. (2002). A New Kind of Science.
Wolfram Media Inc.
320
321
Compilation of References
Abbeel, P., & Ng, A. Y. (2005). Exploration and appren- Ando, Y., Masutani, O., Honiden, S., Fukazawa, Y., &
ticeship learning in reinforcement learning. In Proceedings Iwasaki, H. (2006). Performance of Pheromone Model
of the Twentyfirst International Conference on Machine for Predicting Traffic Congestion. In . Proceedings of
Learning (pp. 1-8). AAMAS, 2006, 73–80.
Abbott, A., Doering, C., Caves, C., Lidar, D., Brandt, Angeline, P. J., Sauders, G. M., & Pollack, J. B. (1994).
H., & Hamilton, A. (2003). Dreams versus Reality: Ple- An evolutionary algorithms that constructs recurrent
nary Debate Session on Quantum Computing. Quantum neural networks. IEEE Transactions on Neural Networks,
Information Processing, 2(6), 449–472. doi:10.1023/ 5, 54–65. doi:10.1109/72.265960
B:QINP.0000042203.24782.9a
Appleton-Young, L. (2008). 2008 real estate market
Aberdeen, D., & Baxter, J. (2002). Scalable Internal-State forecast. California Association of Realtors. Retrieved
Policy-Gradient Methods for POMDPs. In Proceedings December 2008, from http://bayareahousingreview.com/
of the Nineteenth International Conference on Machine wp-content/uploads/2008/02/ leslie_appleton_young
Learning (pp. 3-10). _preso _read-only1.pdf.
Acerbi, A., et al. (2007). Social Facilitation on the De- Arai, S. & Tanaka, N. (2006). Experimental Analysis
velopment of Foraging Behaviors in a Population of of Reward Design for Continuing Task in Multiagent
Autonomous Robots. In Proceedings of the 9th European Domains. Journal of Japanese Society for Artificial
Conference in Artificial Life (pp. 625-634). Intelligence, in Japanese, 13(5), 537-546.
Agogino, A. K., & Tumer, K. (2004). Unifying Temporal Aranha, C., & Iba, H. (2007). Portfolio Management by
and Structural Credit Assignment Problems. In Proceed- Genetic Algorithms with Error Modeling. In JCIS Online
ings of the Third International Joint Conference on Au- Proceedings of International Conference on Computa-
tonomous Agents and Multi-Agent Systems (pp. 980-987). tional Intelligence in Economics & Finance.
Agosta, W. (1992). Chemical Communication – The Arthur, W. B. (1993). On designing economic agents
Language of Pheromone. W. H. Freeman and Company. that behave like human agents. Journal of Evolutionary
Economics, 3, 1–22. doi:10.1007/BF01199986
Alfarano, S., Wagner, F., & Lux,T. (2004). Estimation of
Agent-Based Models: the case of an asymmetric herding Arthur, W. B., Holland, J. H., LeBaron, B., Palmer, R.
model. G., & Taylor, P. (1997). Asset Pricing under Endogenous
Expectations in an Artificial Stock Market. [Addison-
Ambler, S. (2008). Scaling Scrum – Meeting Real World
Wesley.]. The Economy as an Evolving Complex System,
Development Needs. Dr. Dobbs Journal. Retrieved
II, 15–44.
April 23, 2008 from http://www.drdobbsonline.net/
architect/207100381.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Compilation of References
Atanassov, K. T. (1999). Intuitionistic Fuzzy Sets, Physica Bazerman, M. (1998). Judgment in Managerial Decision
Verlag. Heidelberg: Springer. Making. John Wiley & Sons.
Axelrod, R. (1997). The Complexity of Cooperation Becker, M., & Szczerbicka, H. (2005). Parameters In-
-Agent-Based Model of Competition and Collaboration. fluencing the Performance of Ant Algorithm Applied to
Princeton University Press. Optimisation of Buffer Size in Manufacturing. Industrial
Engineering and Management Systems, 4(2), 184–191.
Axtell, R. (2000). Why Agents? On the Varied Motiva-
tion For Agent Computing In the Social Sciences. The Beer, R. D. (1996). Toward the Evolution of Dynami-
Brookings Institution Center on Social and Economic cal Neural Networks for Minimally Cognitive. In From
Dynamics Working Paper, November, No.17. Animals to Animats 4: Proceedings of the Fourth Inter-
national Conference on Simulation of Adaptive Behavior
Bäck, T. (1996). Evolutionary Algorithms in Theory and
(pp. 421-429).
Practice: Evolution Strategies, Evolutionary Program-
ming, Genetic Algorithms. Oxford University Press. Benenti, G. (2004). Principles of Quantum Computation
and Information (Vol. 1). New Jersey: World Scientific.
Bagnall, A. J., & Smith, G. D. (2005). A Multi agent
Model of UK Market in Electricity Generation. IEEE Beni, G., & Wang, J. (1989). Swarm Intelligence in
Transactions on Evolutionary Computation, 522–536. Cellular Robotic Systems. In Proceed. NATO Advanced
doi:10.1109/TEVC.2005.850264 Workshop on Robots and Biological Systems, Tuscany,
Italy, June 26–30
Baird, L., & Poole, D. (1999). Gradient Descent for
General Reinforcement Learning. Advances in Neural Benjamin, D., Brown, S., & Shapiro, J. (2006). Who is ‘be-
Information Processing Systems, 11, 968–974. havioral’? Cognitive ability and anomalous preferences.
Levine’s Working Paper Archive 122247000000001334,
Baki, B., Bouzid, M., Ligęza, A., & Mouaddib, A. (2006).
UCLA Department of Economics.
A centralized planning technique with temporal constraints
and uncertainty for multi-agent systems. Journal of Ex- Bennett, F., III. (1996). Emergence of a Multi-Agent
perimental & Theoretical Artificial Intelligence, 18(3), Architecture and New Tactics for the Ant Colony Food
331–364. doi:10.1080/09528130600906340 Foraging Problem Using Genetic Programming. In From
Animals to Animats 4, Proceedings of the Fourth Interna-
Baldassarre, G., Nolfi, S., & Parisi, D. (2003).
tional Conference on Simulations of Adaptive Behavior
Evolving Mobile Robots Able to Display Collec-
(pp. 430–439).
tive Behaviours . Artificial Life, 9(3), 255–267.
doi:10.1162/106454603322392460 Binder, W. J., Hulaas, G., & Villazon, A. (2001). Portable
Resource Control in the J-SEAL2 Mobile Agent System. In
Baldassarre, G. (2007, June). Research on brain and be-
Proceedings of International Conference on Autonomous
haviour, and agent-based modelling, will deeply impact
Agents (pp. 222-223).
investigations on well-being (and theoretical economics).
Paper presented at International Conference on Policies Black, F., & Litterman, R. (1992, Sept/Oct). Global Port-
for Happiness, Certosa di Pontignano, Siena, Italy. folio Optimization. Financial Analysts Journal, 28–43.
doi:10.2469/faj.v48.n5.28
Barto, A. (1996). Muti-agent reinforcement learning and
adaptive neural networks. Retrieved December 2008, from Blynel, J., & Floreano, D. (2003). Exploring the T-Maze:
http://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA3152 Evolving Learning-Like Robot Behaviors using CTRNNs.
66&Location=U2&doc =GetTRDoc.pdf. In Proceedings of the 2nd European Workshop on Evo-
lutionary Robotics (EvoRob’2003) (LNCS).
322
Compilation of References
Boehm, B., & Turner, R. (2004). Balancing Agility and C.A.R. (2008). U.S. economic outlook: 2008. Retrieved
discipline: A Guide for the Perplexed. Addison-Wesley December 2008, from http://rodomino.realtor.org/
Press. Research.nsf/files/ currentforecast.pdf/$FILE/current-
forecast.pdf.
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). Swarm
Intelligence from Natural to Artificial Systems. Oxford Callebaut, W., & Rasskin-Gutman, D. (Eds.). (2005).
University Press. Modularity: Understanding the development and evolution
of natural complex systems. MA: MIT Press.
Bornholdt, S. (2001). Expectation bubbles in a spin model
of markets. International Journal of Modern Physics C, Caplin, A., & Dean, M. (2008). Economic insights from
12(5), 667–674. doi:10.1142/S0129183101001845 ``neuroeconomic’’ data. The American Economic Review,
98(2), 169–174. doi:10.1257/aer.98.2.169
Bossaerts, P., Beierholm, U., Anen, C., Tzieropoulos, H.,
Quartz, S., de Peralta, R., & Gonzalez, S. (2008, Septem- Carnap, R., & Jeffrey, R. (1971). Studies in Inductive
ber). Neurobiological foundations for “dual system”’ Logics and Probability (Vol. 1, pp. 35–165). Berkeley,
theory in decision making under uncertainty: fMRI and CA: University of California Press.
EEG evidence. Paper presented at Annual Conference on
Casari, M. (2004). Can genetic algorithms explain ex-
Neuroeconomics, Park City, Utah.
perimental anomalies? An application to common prop-
Bostonbubble.com. (2007). S&P/Case-Shiller Boston erty resources. Computational Economics, 24, 257–275.
snapshot Q3 2007. Retrieved December 2008, from http:// doi:10.1007/s10614-004-4197-5
www.bostonbubble.com/forums/viewtopic.php?t=598.
Case, K., Glaeser, E., & Parker, J. (2000). Real estate
Boswijk H. P., Hommes C. H, & Manzan, S. (2004). and the macroeconomy. Brookings Papers on Economic
Behavioral Heterogeneity in Stock Prices. Activity, 2, 119–162. doi:.doi:10.1353/eca.2000.0011
Boutilier, C., & Poole, D. (1996). Computing Optimal Case, K., & Shiller, R. (1989). The efficiency of the
Policies for Partially Observable Decision Processes market for single-family homes. The American Economic
using Compact Representations. In Proceedings of the Review, 79, 125–137.
Thirteenth National Conference on Artificial Intelligence
Case, K., & Shiller, R. (1990). Forecasting prices and
(pp. 1168-1175).
excess returns in the housing market. American Real
Brocas, I., & Carrillo, J. (2008a). The brain as a hierarchi- Estate and Urban Economics Association Journal, 18,
cal organization. The American Economic Review, 98(4), 263–273. doi:.doi:10.1111/1540-6229.00521
1312–1346. doi:10.1257/aer.98.4.1312
Case, K., & Shiller, R. (2003). Is there a bubble in the
Brocas, I., & Carrillo, J. (2008b). Theories of the mind. housing market? Brookings Papers on Economic Activity,
American Economic Review: Papers\& Proceedings, 1, 299–342. doi:.doi:10.1353/eca.2004.0004
98(2), 175-180.
Chalkiadakis, G., & Boutilier, C. (2008). Sequential Deci-
Brunnermeier, M. K. (2001). Asset Pricing under sion Making in Repeated Coalition Formation under Un-
Asymmetric Information. Oxford University Press. certainty, In: Proc. of 7th Int. Conf. on Autonomous Agents
doi:10.1093/0198296983.001.0001 and Multi-agent Systems (AA-MAS 2008), Padgham,
Parkes, Müller and Parsons (eds.), May, 12-16, 2008,
Budhraja, V. S. (2001). California’s electricity crisis. IEEE
Estoril, Portugal, http://eprints.ecs.soton.ac.uk/15174/1/
Power Engineering Society Summer Meeting.
BayesRLCF08.pdf
323
Compilation of References
Chan, N. T., LeBaron, B., Lo, A. W., & Poggio, T. (2008). Chen, S.-H., & Tai, C.-C. (2003). Trading restrictions,
Agent-based models of financial markets: A comparison price dynamics and allocative efficiency in double auction
with experimental markets. MIT Artificial Markets Proj- markets: an analysis based on agent-based modeling and
ect, Paper No. 124, September. Retrieved January 1, 2008, simulations. Advances in Complex Systems, 6(3), 283–302.
from http://citeseer.ist.psu.edu/chan99agentbased.html. doi:10.1142/S021952590300089X
Chang, T. J., Meade, N., Beasley, J. E., & Sharaiha, Y. M. Chen, S.-H., Zeng, R.-J., & Yu, T. (2009a). Analysis of
(2000). Heuristics for Cardinality Constrained Portfolio Micro-Behavior and Bounded Rationality in Double
Optimization . Computers & Operations Research, 27, Auction Markets Using Co-evolutionary GP . In Pro-
1271–1302. doi:10.1016/S0305-0548(99)00074-X ceedings of World Summit on Genetic and Evolutionary
Computation. ACM.
Charness, G., & Levin, D. (2005). When optimal choices
feel wrong: A laboratory study of Bayesian updating, Chen, S., & Yeh, C. (1996). Genetic programming learning
complexity, and affect. The American Economic Review, and the cobweb model . In Angeline, P. (Ed.), Advances in
95(4), 1300–1309. doi:10.1257/0002828054825583 Genetic Programming (Vol. 2, pp. 443–466). Cambridge,
MA: MIT Press.
Chattoe, E. (1998). Just how (un)realistic are evolutionary
algorithms as representations of social processes? Journal Chen, S.-H., Zeng, R.-J., & Yu, T. (2009). Co-evolving
of Artificial Societies and Social Simulation, 1. trading strategies to analyze bounded rationality in double
auction markets . In Riolo, R., Soule, T., & Worzel, B.
Chen, S. H., & Yeh, C. H. (2002). On the Emergent
(Eds.), Genetic Programming: Theory and Practice VI (pp.
Properties of Artificial Stock Markets: The Efficient
195–213). Springer. doi:10.1007/978-0-387-87623-8_13
Market Hypothesis and the Rational Expectations Hypoth-
esis. Journal of Behavior &Organization, 49, 217–239. Chen, S.-H., Zeng, R.-J., & Yu, T. (2008). Co-evolving
doi:10.1016/S0167-2681(02)00068-9 trading strategies to analyze bounded rationality in double
auction markets . In Riolo, R., Soule, T., & Worzel, B.
Chen, S.-H. (2008). Software-agent designs in econom-
(Eds.), Genetic Programming Theory and Practice VI
ics: An interdisciplinary framework. IEEE Computa-
(pp. 195–213). Springer.
tional Intelligence Magazine, 3(4), 18–22. doi:10.1109/
MCI.2008.929844 Chen, S.-H., & Chie, B.-T. (2007). Modularity, product
innovation, and consumer satisfaction: An agent-based
Chen, S.-H., & Chie, B.-T. (2004). Agent-based economic
approach . In Yin, H., Tino, P., Corchado, E., Byrne, W.,
modeling of the evolution of technology: The relevance
& Yao, X. (Eds.), Intelligent Data Engineering and Auto-
of functional modularity and genetic programming.
mated Learning (pp. 1053–1062). Heidelberg, Germany:
International Journal of Modern Physics B, 18(17-19),
Springer. doi:10.1007/978-3-540-77226-2_105
2376–2386. doi:10.1142/S0217979204025403
Chen, L., Xu, X., & Chen, Y. (2004). An adaptive ant
Chen, S.-H., & Huang, Y.-C. (2008). Risk preference,
colony clustering algorithm. In Proceedings of the Third
forecasting accuracy and survival dynamics: Simulations
IEEE International Conference on Machine Learning and
based on a multi-asset agent-based artificial stock market.
Cybernetics (pp. 1387-1392).
Journal of Economic Behavior & Organization, 67(3),
702–717. doi:10.1016/j.jebo.2006.11.006 Chen, S., & Yeh, C. (1995). Predicting stock returns
with genetic programming: Do the short-run nonlinear
regularities exist? In D. Fisher (Ed.), Proceedings of the
Fifth International Workshop on Artificial Intelligence
and Statistics (pp. 95-101). Ft. Lauderdale, FL.
324
Compilation of References
Chen, S.-H., Chie, B.-T., & Tai, C.-C. (2001). Evolv- Colorni, A., Dorigo, M., & Maniezzo, V. (1991). Dis-
ing bargaining strategies with genetic programming: tributed Optimization by Ant Colonies. In . Proceedings,
An overview of AIE-DA Ver. 2, Part 2. In B. Verma & ECAL91, 134–142.
A. Ohuchi (Eds.), Proceedings of Fourth International
Colyvan, M. (2004). The Philosophical Significance of
Conference on Computational Intelligence and Multi-
Cox’s Theorem. International Journal of Approximate
media Applications (ICCIMA 2001) (pp. 55–60). IEEE
Reasoning, 37(1), 71–85. doi:10.1016/j.ijar.2003.11.001
Computer Society Press.
Colyvan, M. (2008). Is Probability the Only Coherent
Chen, X., & Tokinaga, S. (2006). Analysis of price
Approach to Uncertainty? Risk Analysis, 28, 645–652.
fluctuation in double auction markets consisting of
doi:10.1111/j.1539-6924.2008.01058.x
multi-agents using the genetic programming for learning.
Retrieved from https://qir.kyushuu.ac.jp/dspace/bitstream Commonweal of Australia. (2001). Economic Outlook.
/2324/8706/ 1/ p147-167.pdf. Retrieved December 2008, from http://www.budget.gov.
au/2000-01/papers/ bp1/html/bs2.htm.
China Bystanders. (2008). Bank profits trimmed by
subprime losses. Retrieved from http://chinabystander. Cont, R., & Bouchaud, J.-P. (2000). Herd behavior and
wordpress.com /2008/03/25/bank-profits-trimmed-by- aggregate fluctuations in financial markets. Macro-
subprime-losses/. economics Dynamics, 4, 170–196.
Chrisman, L. (1992). Reinforcement Learning with Per- Covaci, S. (1999). Autonomous Agent Technology. In
ceptual Aliasing: The Perceptual Distinctions Approach. Proceedings of the 4th international symposium on Au-
In Proceedings of the Tenth National Conference on tonomous Decentralized Systems. Washington, DC: IEEE
Artificial Intelligence (pp. 183-188). Computer Science Society.
Cincotti, S., Focardi, S., Marchesi, M., & Raberto, M. Croley, T., & Lewis, C. (2006).. . Journal of Great
(2003). Who wins? Study of long-run trader survival Lakes Research, 32, 852–869. doi:10.3394/0380-
in an artificial stock market. Physica A, 324, 227–233. 1330(2006)32[852:WADCTM]2.0.CO;2
doi:10.1016/S0378-4371(02)01902-7
D’Espagnat, B. (1999). Conceptual Foundation of Quan-
Cliff, D., Harvey, I., & Husbands, P. (1993). Explora- tum mechanics (2nd ed.). Perseus Books.
tions in Evolutionary Robotics . Adaptive Behavior, 2(1),
d’Acremont, M., & Bossaerts, P. (2008, September).
71–104. doi:10.1177/105971239300200104
Grasping the fundamental difference between expected
Cliff, D., & Bruten, J. (1997). Zero is not enough: On the utility and mean-variance theories. Paper presented at
lower limit of agent intelligence for continuous double Annual Conference on Neuroeconomics, Park City, Utah.
auction markets (Technical Report no. HPL-97-141).
Das, R., Hanson, J. E., Kephart, J. O., & Tesauro, G.
Hewlett-Packard Laboratories. Retrieved January 1, 2008,
(2001). Agent-human interactions in the continuous
from http://citeseer.ist.psu.edu/cliff97zero.html
double auction. In Proceedings of the 17th International
CME 2007. (n.d.). Retrieved December 2008, from http:// Joint Conference on Artificial Intelligence (IJCAI), San
www.cme.com/trading/prd/re/housing.html. Francisco. CA: Morgan-Kaufmann.
Collins, R., & Jeffersion, D. (1991). AntFarm: Towards De Long, J. B., Shleifer, A. L., Summers, H., & Wald-
Simulated Evolution. In Artificial Life II, Proceedings mann, R. J. (1991). The survival of noise traders in
of the Second International Conference on Artificial Life financial markets. The Journal of Business, 64(1), 1–19.
(pp. 159–168). doi:10.1086/296523
325
Compilation of References
Deneuburg, J., Goss, S., Franks, N., Sendova-Franks, Durlauf, S. N., & Young, H. P. (2001). Social Dynamics.
A., Detrain, C., & Chretien, L. (1991). The Dynamics of Brookings Institution Press.
Collective Sorting: Robot-Like Ant and Ant-Like Robot.
Easley, D., & Ledyard, J. (1993). Theories of price forma-
In Proceedings of First Conference on Simulation of Adap-
tion and exchange in double oral auction . In Friedman, D.,
tive Behavior: From Animals to Animats (pp. 356-363).
& Rust, J. (Eds.), The Double Auction Market-Institutions,
Cambridge: MIT Press.
Theories, and Evidence. Addison-Wesley.
Detterman, D. K., & Daniel, M. H. (1989). Correlations of
Economist.com. (2007). The world economy: Rocky
mental tests with each other and with cognitive variables
terrain ahead. Retrieved December 2008, from
are highest for low-IQ groups. Intelligence, 13, 349–359.
http://www.economist.com/ daily/news/displaystory.
doi:10.1016/S0160-2896(89)80007-8
cfm?storyid=9725432&top_story=1.
Devetag, G., & Warglien, M. (2003). Games and phone
Edmonds, B. (2002). Review of Reasoning about Ratio-
numbers: Do short-term memory bounds affect strategic
nal Agents by Michael Wooldridge. Journal of Artificial
behavior? Journal of Economic Psychology, 24, 189–202.
Societies and Social Simulation, 5(1). Retrieved from
doi:10.1016/S0167-4870(02)00202-7
http://jasss.soc.surrey.ac.uk/5/1/reviews/edmonds.html.
Devetag, G., & Warglien, M. (2008). Playing the wrong
Edmonds, B. (1998). Modelling socially intelligent
game: An experimental analysis of relational complexity
agents. Applied Artificial Intelligence, 12, 677–699.
and strategic misrepresentation. Games and Economic
doi:10.1080/088395198117587
Behavior, 62, 364–382. doi:10.1016/j.geb.2007.05.007
Ellis, C., Kenyon, I., & Spence, M. (1990). Occasional
Dimeas, A. L., & Hatziargyriou, N. D. (2007). Agent based
Publication of the London Chapter . OAS, 5, 65–124.
control of Virtual Power Plants. International Conference
on Intelligent Systems Applications to Power Systems. Elton, E., Gruber, G., & Blake, C. (1996). Survivorship
Bias and Mutual Fund Performance. Review of Financial
DiVincenzo, D. (1995). Quantum Computation. Science,
Studies, 9, 1097–1120. doi:10.1093/rfs/9.4.1097
270(5234), 255–261. doi:10.1126/science.270.5234.255
Epstein, J. M., & Axtell, R. (1996). Growing Artificial
DiVincenzo, D. (2000). The Physical Implementation
Societies Social Science From the The Bottom Up. MIT
of Quantum Computation. Experimental Proposals for
Press.
Quantum Computation. arXiv:quant-ph/0002077
Evolution Robotics Ltd. Homepage (2008). Retrieved
Dorigo, M., & Gambardella, L. M. (1996). Ant Colony
from http://www.evolution.com/
System: a Cooperative Learning Approach to the Traveling
Salesman . IEEE Transactions on Evolutionary Computa- Fagin, R., & Halpern, J. (1994). Reasoning about Knowl-
tion, 1(1), 53–66. doi:10.1109/4235.585892 edge and Probability. Journal of the ACM, 41(2), 340–367.
doi:10.1145/174652.174658
Dorigo, M., & Stutzle, T. (2004). Ant Colony Optimiza-
tion. Cambridge, MA: The MIT Press. Fair, R., & Jaffee, D. (1972). Methods of estimation for
markets in disequilibrium. Econometrica, 40, 497–514.
Dorigo, M., Maniezzo, V., & Colorni, A. (1991). Positive
doi:.doi:10.2307/1913181
Feedback as a Search Strategy (Technical Report No.
91-016). Politecnico di Milano. Fama, E. (1970). Efficient Capital Markets: A Review of
Theory and Empirical Work. The Journal of Finance, 25,
Duffy, J. (2006). Agent-based models and human subject
383–417. doi:10.2307/2325486
experiments . In Tesfatsion, L., & Judd, K. (Eds.), Hand-
book of Computational Economics (Vol. 2). North Holland.
326
Compilation of References
Feldman, J. (1962). Computer simulation of cognitive Gigerenzer, G., & Selten, R. (2002). Bounded Rationality.
processes . In Broko, H. (Ed.), Computer applications Cambridge: The MIT Press.
in the behavioral sciences. Upper Saddle River, NJ:
Gjerstad, S., & Dickhaut, J. (1998). Price formation in
Prentice Hall.
double auctions. Games and Economic Behavior, 22,
Ferber, J. (1999). Multi Agent Systems. Addison Wesley. 1–29. doi:10.1006/game.1997.0576
Feynman, R. (1982). Simulating physics with computers. Gode, D. K., & Sunder, S. (1993). Allocative efficiency
International Journal of Theoretical Physics, 21, 467. of markets with zero-intelligence traders: markets as a
doi:10.1007/BF02650179 partial substitute for individual rationality. The Journal
of Political Economy, 101, 119–137. doi:10.1086/261868
Figner, B., Johnson, E., Lai, G., Krosch, A., Steffener,
J., & Weber, E. (2008, September). Asymmetries in Gode, D., & Sunder, S. (1993). Allocative efficiency
intertemporal discounting: Neural systems and the direc- of markets with zero-intelligence traders: Market as a
tional evaluation of immediate vs future rewards. Paper partial substitute for individual rationality. The Journal
presented at Annual Conference on Neuroeconomics, of Political Economy, 101, 119–137. doi:10.1086/261868
Park City, Utah.
Goldberg, D. E. (1989). Genetic Algorithms in Search,
Fischhoff, B. (1991). Value elicitation: Is there anything Optimization and Machine Learning. Addison-Wesley.
in there? The American Psychologist, 46, 835–847.
Gomez, F. J. and Miikkulainen, R. (1999). Solving
doi:10.1037/0003-066X.46.8.835
Non-Markovian Control Tasks with Neuroevolution, In
Flament, C. (1963). Applications of graphs theory to Proceedings of the International Joint Conference on
group structure. London: Prentice Hall. Artificial Intelligence (pp. 1356-1361).
Freddie Mac. (2008a). CMHPI data. Retrieved Decem- Gosavi, A. (2004). A Reinforcement Learning Algorithm
ber 2008, from http://www.freddiemac.com/finance/ Based on Policy Iteration for Average Reward: Em-
cmhpi/#old. pirical Results with Yield Management and Convergence
Analysis. Machine Learning, 55, 5–29. doi:10.1023/
Freddie Mac. (2008b). 30-year fixed rate historical Tables.
B:MACH.0000019802.64038.6c
Historical PMMS® Data. Retrieved December 2008,
from http://www.freddiemac.com/pmms/pmms30.htm. Gottfredson, L. S. (1997). Mainstream science on intel-
ligence: An editorial with 52 signatories, history, and
Frederick, S., Loewenstein, G., & O’Donoghue, T.
bibliography. Intelligence, 24(1), 13–23. doi:10.1016/
(2002). Time discounting and time preference: A critical
S0160-2896(97)90011-8
review. Journal of Economic Literature, XL, 351–401.
doi:10.1257/002205102320161311 Grefenstette, J. J. (1988). Credit Assignment in Rule Dis-
covery Systems Based on Genetic Algorithms. Machine
Friedman, D. (1991). A simple testable model of double
Learning, 3, 225–245. doi:10.1007/BF00113898
auction markets. Journal of Economic Behavior & Orga-
nization, 15, 47–70. doi:10.1016/0167-2681(91)90004-H Gregg, L., & Simon, H. (1979). Process models and
stochastic theories of simple concept formation. In H.
Friedman, M. (1953). Essays in Positive Economics.
Simon, Models of Thought (Vol. I). New Haven, CT:
University of Chicago Press.
Yale Uniersity Press.
Fudenberg, D., & Levine, D. (2006). A dual-self model of
impulse control. The American Economic Review, 96(5),
1449–1476. doi:10.1257/aer.96.5.1449
327
Compilation of References
Grossklags, J., & Schmidt, C. (2006). Software agents and Hiroshi, I., & Masahito, H. (2006). Quantum Computation
market (in)efficiency—a human trader experiment. IEEE and Information. Berlin: Springer.
Transactions on System, Man, and Cybernetics: Part C .
Hisdal, E. (1998). Logical Structures for Representation
Special Issue on Game-theoretic Analysis & Simulation
of Knowledge and Uncertainty. Springer.
of Negotiation Agents, 36(1), 56–67.
Holland, J. H. (1975). Adaptation in Natural and Artificial
Group, C. M. E. (2007). S&P/Case-Shiller Price Index:
Systems. University of Michigan Press.
Futures and options. Retrieved December 2008, from
http://housingderivatives. typepad.com/housing_deriva- Hough, J. (1958). Geology of the Great Lakes. [Univ. of
tives/files/cme_housing _fact_sheet.pdf. Illinois Press.]. Urbana (Caracas, Venezuela), IL.
Gruber, M. J. (1996). Another Puzzle: The Growth in Housing Predictor. (2008). Independent real estate hous-
Actively Managed Mutual Funds. The Journal of Finance, ing forecast. Retrieved December 2008, from http://www.
51(3), 783–810. doi:10.2307/2329222 housingpredictor.com/ california.html.
Haji, K. (2007). Subprime mortgage crisis casts a Hunt, E. (1995). The role of intelligence in modern society.
global shadow – medium-term economic forecast (FY American Scientist, (July/August): 356–368.
2007~2017). Retrieved December 2008, from http://www.
Huynh, T. D., Jennings, N. R., & Shadbolt, N. R. (2006).
nli-research.co.jp/english/economics/2007/ eco071228.
An integrated trust and reputation model for open multi-
pdf.
agent systems. Journal of Autonomous agents and multi
Halpern, J. (2005). Reasoning about uncertainty. MIT agent systems.
Press.
Iacono, T. (2008). Case-Shiller® Home Price Index fore-
Hanaki, N. (2005). Individual and social learning. casts: Exclusive house-price forecasts based on Fiserv’s
Computational Economics, 26, 213–232. doi:10.1007/ leading Case-Shiller Home Price Indexes. Retrieved
s10614-005-9003-5 December 2008, from http://www.economy.com/home/
products/ case_shiller_indexes.asp.
Harmanec, D., Resconi, G., Klir, G. J., & Pan, Y. (1995).
On the computation of uncertainty measure in Dempster- Ingerson, T. E., & Buvel, R. L. (1984). Structure in Asyn-
Shafer theory. International Journal of General Systems, chronous Cellular Automata. Physica D. Nonlinear Phe-
25(2), 153–163. doi:10.1080/03081079608945140 nomena, 10, 59–68. doi:10.1016/0167-2789(84)90249-5
Harvey, I., Di Paolo, E., Wood, A., & Quinn, R., M., & Iwase, Y., Suzuki, R., & Arita, T. (2007). Evolutionary
Tuci, E. (2005). Evolutionary Robotics: A New Scientific Search for Cellular Automata that Exhibit Self-Organizing
Tool for Studying Cognition. Artificial Life, 11(3/4), Properties Induced by External Perturbations. In Proc.
79–98. doi:10.1162/1064546053278991 2007 IEEE Congress on Evolutionary Computation
(CEC2007) (pp. 759-765).
Harvey, I., Husbands, P., Cliff, D., Thompson, A., &
Jakobi, N. (1997). Evolutionary robotics: The sussex ap- Iyengar, S., & Lepper, M. (2000). When choice is demoti-
proach. Robotics and Autonomous Systems, 20, 205–224. vating: Can one desire too much of a good thing? Journal
doi:10.1016/S0921-8890(96)00067-X of Personality and Social Psychology, 79(6), 995–1006.
doi:10.1037/0022-3514.79.6.995
Hatziargyriou, N. D., Dimeas, A., Tsikalakis, A. G., Lopes,
J. A. P., Kariniotakis, G., & Oyarzabal, J. (2005). Manage- Jaakkola, T., Singh, S. P., & Jordan, M. I. (1994). Rein-
ment of Microgrids in Market Environment. International forcement Learning Algorithm for Partially Observable
Conference on Future Power Systems. Markov Decision Problems. Advances in Neural Informa-
tion Processing Systems, 7, 345–352.
328
Compilation of References
Jaeger, G. (2006). Quantum Information: An Overview. Kahneman, D., Diener, E., & Schwarz, N. (Eds.). (2003).
Berlin: Springer. Well-Being: The Foundations of Hedonic Psychology.
New York, NY: Russell Sage Foundation.
Jamison, J., Saxton, K., Aungle, P., & Francis, D. (2008).
The development of preferences in rat pups. Paper pre- Kahneman, D., Knetsch, J., & Thaler, R. (1990). Experi-
sented at Annual Conference on Neuroeconomics, Park mental tests of the endowment effect and the Coase theo-
City, Utah. rem. The Journal of Political Economy, 98, 1325–1348.
doi:10.1086/261737
Jayantilal, A., Cheung, K. W., Shamsollahi, P., & Bre-
sler, F. S. (2001). Market Based Regulation for the PJM Kahneman, D., Knetsch, J., & Thaler, R. (1991). Anoma-
Electricity Market. IEEE International Conference on lies: The endowment effect, loss aversion, and status
Innovative Computing for Power Electric Energy Meets quo bias. The Journal of Economic Perspectives, 5(1),
the Markets (pp. 155-160). 193–206.
Jevons, W. (1879). The Theory of Political Economy, Kahneman, D., Ritov, I., & Schkade, D. (1999). Economic
2nd Edtion. Edited and introduced by R. Black (1970). preferences or attitude expressions? An analysis of dollar
Harmondsworth: Penguin. responses to public issues. Journal of Risk and Uncertainty,
19, 203–235. doi:10.1023/A:1007835629236
Johnson, E., Haeubl, G., & Keinan, A. (2007). Aspects
of endowment: A query theory account of loss aversion Kahneman, D. (2003). Maps of Bounded Rational-
for simple objects. Journal of Experimental Psychol- ity: Psychology for Behavioral Economics. The
ogy. Learning, Memory, and Cognition, 33, 461–474. American Economic Review, 93(5), 1449–1475.
doi:10.1037/0278-7393.33.3.461 doi:10.1257/000282803322655392
Johnson, N., Jeffries, P., & Hui, P. M. (2003). Financial Kahneman, D., & Tversky, A. (1979). Prospect Theory
Market Complexity. Oxford. of Decisions under Risk. Econometrica, 47, 263–291.
doi:10.2307/1914185
Jurca, R., & Faltings, B. (2003). Towards Incentive-
Compatible Reputation Management. Trust, Reputation Kahneman, D., & Tversky, A. (1992). Advances in. pros-
and Security: Theories and Practice (LNAI 2631, pp. pect Theory: Cumulative representation of Uncertainty.
138-147). Journal of Risk and Uncertainty, 5.
Kaboudan, M. (2001). Genetically evolved models Kaizoji. T, Bornholdt, S. & Fujiwara.Y. (2002). Dynam-
and normality of their residuals. Journal of Economic ics of price and trading volume in a spin model of stock
Dynamics & Control, 25, 1719–1749. doi:.doi:10.1016/ markets with heterogeneous agent. Physica A.
S0165-1889(00)00004-X
Kambayashi, Y., & Takimoto, M. (2005). Higher-Order
Kaboudan, M. (2004). TSGP: A time series genetic pro- Mobile Agents for Controlling Intelligent Robots. Inter-
gramming software. Retrieved December 2008, from national Journal of Intelligent Information Technologies,
http://bulldog2.redlands.edu/ fac/mak_kaboudan/tsgp. 1(2), 28–42.
Kagan, H. (2006). The Psychological Immune System: Kambayashi, Y., Sato, O., Harada, Y., & Takimoto, M.
A New Look at Protection and Survival. Bloomington, (2009). Design of an Intelligent Cart System for Common
IN: AuthorHouse. Airports. In Proceedings of 13th International Symposium
on Consumer Electronics. CD-ROM.
Kagel, J. (1995). Auction: A survey of experimental re-
search . In Kagel, J., & Roth, A. (Eds.), The Handbook
of Experimental Economics. Princeton University Press.
329
Compilation of References
Kambayashi, Y., Tsujimura, Y., Yamachi, H., Takimoto, Kovalerchuk, B., & Vityaev, E. (2000). Data mining in fi-
M., & Yamamoto, H. (2009). Design of a Multi-Robot nance: advances in relational and hybrid methods. Kluwer.
System Using Mobile Agents with Ant Colony Cluster-
Kovalerchuk, B. (1990). Analysis of Gaines’ logic of
ing. In Proceedings of Hawaii International Conference
uncertainty, In I.B. Turksen (Ed.), Proceedings of NAFIPS
on System Sciences. IEEE Computer Society. CD-ROM
’90 (Vol. 2, pp. 293-295).
Kawamura, H., Yamamoto, M., & Ohuchi, A. (2001).
Koza, J. (1992). Genetic programming. Cambridge, MA:
(in Japanese). Investigation of Evolutionary Pheromone
The MIT Press.
Communication Based on External Measurement and
Emergence of Swarm Intelligence. Japanese Journal of Koza, J. R. (1992). Genetic Programming: On the Pro-
the Society of Instrument and Control Engineers, 37(5), gramming of Computers by Means of Natural Selection.
455–464. MIT Press.
Kawamura, H., & Yamamoto, M. Suzuki & Ohuchi, Krishna, V., & Ramesh, V. C. (1998). Intelligent
A. (1999). Ants War with Evolutive Pheromone Style agents for negotiations in market games. Part I. Model.
Communication. In Advances in Artificial Life, ECAL’99 IEEE Transactions on Power Systems, 1103–1108.
(LNAI 1674, pp. 639-643). doi:10.1109/59.709106
Kennedy, J., & Eberhert, R. C. (2001). Swarm Intelligence. Krishna, V., & Ramesh, V. C. (1998a). Intelligent agents
Morgan Kaufmann. for negotiations in market games. Part II. Application.
IEEE Transactions on Power Systems, 1109–1114.
Kepecs, A., Uchida, N., & Mainen, Z. (2008, September).
doi:10.1109/59.709107
How uncertainty boosts learning: Dynamic updating of
decision strategies. Paper presented at Annual Conference Kuhlmann, G., & Stone, P. (2003). Progress in learning
on Neuroeconomics, Park City, Utah. 3 vs. 2 keepaway. In Proceedings of the RoboCup-2003
Symposium.
Kimura, H., Yamamura, M., & Kobayashi, S. (1995).
Reinforcement Learning by Stochastic Hill Climbing on Kuhnen, C., & Chiao, J. (2008, September). Genetic
Discounted Reward. In Proceedings of the Twelfth Inter- determinants of financial risk taking. Paper presented at
national Conference on Machine Learning (pp. 295-303). Annual Conference on Neuroeconomics, Park City, Utah.
Klucharev, V., Hytonen, K., Rijpkema, M., Smidts, A., & Kwak, K. J., Baryshnikov, Y. M., & Coffman, E. G. (2008).
Fernandez, G. (2008, September). Neural mechanisms of Self-Organizing Sleep-Wake Sensor Systems. In Proc.
social decisions. Paper presented at Annual Conference the 2nd IEEE International Conference on Self-Adaptive
on Neuroeconomics, Park City, Utah. and Self-Organizing Systems (SASO2008) (pp. 393-402).
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Kyle, A. S., & Wang, A. (1997). Speculation Duopoly
Algorithms. Advances in Neural Information Processing with Agreement to Disagree: Can Overconfidence Survive
Systems, 12, 1008–1014. the Market Test? The Journal of Finance, 52, 2073–2090.
doi:10.2307/2329474
Koritarov, V. S. (2004). Real-World Market Representa-
tion with Agents (pp. 39–46). IEEE Power and Energy Laibson, D. (1997). Golden eggs and hyperbolic discount-
Magazine. ing. The Quarterly Journal of Economics, 12(2), 443–477.
doi:10.1162/003355397555253
Kovalerchuk, B. (1996). Context spaces as necessary
frames for correct approximate reasoning. Interna-
tional Journal of General Systems, 25(1), 61–80.
doi:10.1080/03081079608945135
330
Compilation of References
Lam, K., & Leung, H. (2004). An Adaptive Strategy Lewis, C. (2007).. . Journal of Paleolimnology, 37,
for Trust/ Honesty Model in Multi-agent Semi- com- 435–452. doi:10.1007/s10933-006-9049-y
petitive Environments. In Proceedings of the 16 th
Lichtenstein, S., & Slovic, P. (Eds.). (2006). The Construc-
IEEE International Conference on Tools with Artificial
tion of Preference. Cambridge, UK: Cambridge University
Intelligence(ICTAI 2004)
Press. doi:10.1017/CBO9780511618031
Larsen, C. (1999). Cranbrook Institute of Science . Bul-
Lieberman, M. (2003). Reflective and reflexive judgment
letin, 64, 1–30.
processes: A social cognitive neuroscience approach . In
Lasseter, R., Akhil, A., Marnay, C., Stephens, J., Dagle, Forgas, J., Williams, K., & von Hippel, W. (Eds.), Social
J., Guttromson, R., et al. (2002, April). White paper on Judgments: Explicit and Implicit Processes (pp. 44–67).
Integration of consortium Energy Resources. The CERTS New York, NY: Cambridge University Press.
MicroGrid Concept. CERTS, CA, Rep.LBNL-50829.
Liepins, G. E., Hilliard, M. R., Palmer, M., & Rangarajan,
Le Baron, B. (2001). A builder’s guide to agent-based G. (1989). Alternatives for Classifier System Credit As-
financial markets. Quantitative Finance, 1(2), 254–261. signment. In Proceedings of the Eleventh International
doi:10.1088/1469-7688/1/2/307 Joint Conference on Artificial Intelligent (pp. 756-761).
LeBaron, B. (2000). Agent-based Computational Finance: Lin, C. C., & Liu, Y. T. (2008). Genetic Algorithms for
Suggested Readings and Early Research. Journal of Portfolio Selection Problems with Minimum Transaction
Economics & Control, 24, 679–702. doi:10.1016/S0165- Lots. European Journal of Operational Research, 185(1),
1889(99)00022-6 393–404. doi:10.1016/j.ejor.2006.12.024
LeBaron, B., Arthur, W. B., & Palmer, R. (1999). Time Lin, C.-H., Chiu, Y.-C., Lin, Y.-K., & Hsieh, J.-C. (2008,
Series Properties of an Artificial Stock Market. Journal September). Brain maps of Soochow Gambling Task. Pa-
of Economics & Control, 23, 1487–1516. doi:10.1016/ per presented at Annual Conference on Neuroeconomics,
S0165-1889(98)00081-5 Park City, Utah.
Lerner, J., Small, D., & Loewenstein, G. (2004). Heart Liu, Y., & Yao, X. (1996). A Population-Based Learning
strings and purse strings: Carry-over effects of emotions Algorithms Which Learns Both Architectures and Weights
on economic transactions. Psychological Science, 15, of Neural Networks. Chinese Journal of Advanced Soft-
337–341. doi:10.1111/j.0956-7976.2004.00679.x ware Research, 3(1), 54–65.
Levy, M., Levy, H., & Solomon, S. (2000). Microscopic Lo, A. (2005). Reconciling efficient markets with be-
Simulation of Financial Markets. Academic Press. havioral finance: The adaptive market hypothesis. The
Journal of Investment Consulting, 7(2), 21–44.
Levy, M. Levy, H., & Solomon, S. (2000). Microscopic
Simulation of Financial Markets: From Investor Behav- Loewenstein, G. (1988). Frames of mind in intertemporal
ior to Market Phenomena. San Diego: Academic Press. choice. Management Science, 34, 200–214. doi:10.1287/
mnsc.34.2.200
Lewandowsky, S., Oberauer, K., Yang, L.-X., & Ecker,
U. (2009). A working memory test battery for Matlab. Loewenstein, G. (2005). Hot-cold empathy gaps and
under prepartion for being submitted to the Journal of medical decision making. Health Psychology, 24(4),
Behavioral Research Method. S49–S56. doi:10.1037/0278-6133.24.4.S49
331
Compilation of References
Loewenstein, G., & Schkade, D. (2003). Wouldn’t it Mamei, M., Roli, A., & Zambonelli, F. (2005). Emergence
be nice?: Predicting future feelings . In Kahneman, D., and Control of Macro-Spatial Structures in Perturbed
Diener, E., & Schwartz, N. (Eds.), Hedonic Psychology: Cellular Automata, and Implications for Pervasive Com-
The Foundations of Hedonic Psychology (pp. 85–105). puting Systems. IEEE Trans. Systems, Man, and Cyber-
New York, NY: Russell Sage Foundation. netics . Part A: Systems and Humans, 35(3), 337–348.
doi:10.1109/TSMCA.2005.846379
Loewenstein, G., & O’Donoghue, T. (2005). Animal
spirits: Affective and deliberative processes in economic Mamei, M. & Zambonelli, F. (2007). Pervasive pher-
behavior. Working Paper. Carnegie Mellon University, omone-based interaction with RFID tags. ACM Trans-
Pittsburgh. actions on Autonomous and Adaptive Systems (TAAS)
archive, 2(2).
Logenthiran, T., Srinivasan, D., & Wong, D. (2008).
Multi-agent coordination for DER in MicroGrid. IEEE Manson, S. M. (2006). Bounded rationality in agent-
International Conference on Sustainable Energy Tech- based models: experiments with evolutionary programs.
nologies (pp. 77-82). International Journal of Geographical Information Sci-
ence, 20(9), 991–1012. doi:10.1080/13658810600830566
Louie, K., Grattan, L., & Glimcher, P. (2008). Value-
based gain control: Relative reward normalization in Markowitz, H. (1952). Portfolio Selection. The Journal
parietal cortex. Paper presented at Annual Conference of Finance, 7, 77–91. doi:10.2307/2975974
on Neuroeconomics, Park City, Utah.
Markowitz, H. (1987). Mean-Variance Analysis in
Lovis, W. (1989). Michigan Cultural Resource Investiga- Portfolio Choice and Capital Market. New York: Basil
tions Series 1, East Lansing. Blackwell.
Ludwig, A., & Torsten, S. (2001). The impact of stock Marr, C., & Hütt, M. T. (2006). Similar Impact of To-
prices and house prices on consumption in OECD coun- pological and Dynamic Noise on Complex Patterns.
tries. Retrieved December 2008, from http://www.vwl. Physics Letters. [Part A], 349, 302–305. doi:10.1016/j.
uni-mannheim.de/brownbag/ludwig.pdf. physleta.2005.08.096
Lumer, E. D., & Faieta, B. (1994). Diversity and Adapta- McCallum, R. A. (1995). Instance-Based Utile Distinc-
tion in Populations of Clustering Ants. In From Animals to tions for Reinforcement Learning with Hidden State. In
Animats 3: Proceedings of the 3rd International Conference Proceedings of the Twelfth International Conference on
on the Simulation of Adaptive Behavior (pp. 501-508). Machine Learning (pp. 387-395).
Cambridge: MIT Press.
McClure, S., Laibson, D., Loewenstein, G., & Cohen,
Lux, T., & Marchesi, M. (1999). Scaling and criticality J. (2004). Separate neural systems value immediate
in a stochastic multi-agent model of a financial market. and delayed monetary rewards. Science, 306, 503–507.
Nature, 397, 498–500. doi:10.1038/17290 doi:10.1126/science.1100907
MacLean, P. (1990). The Triune Brain in Evolution: Role Merrick, K., & Maher, M. L. (2007). Motivated Reinforce-
in Paleocerebral Function. New York, NY: Plenum Press. ment Learning for Adaptive Characters in Open-Ended
Simulation Games. In Proceedings of the International
Malkiel, B. (1995). Returns from Investing in Equity
Conference on Advanced in Computer Entertainment
Mutual Funds 1971 to 1991. The Journal of Finance, 50,
Technology (pp. 127-134).
549–572. doi:10.2307/2329419
332
Compilation of References
Mitchell, M., Crutchfield, J. P., & Hraber, P. T. (1994). Mondada, F., & Floreano, D. (1995). Evolution of neural
Evolving Cellular Automata to Perform Computations: control structures: Some experiments on mobile robots.
Mechanisms and Impediments. Physica D. Nonlinear Phe- Robotics and Autonomous Systems, 16(2-4), 183–195.
nomena, 75, 361–391. doi:10.1016/0167-2789(94)90293- doi:10.1016/0921-8890(96)81008-6
3
Money, C. N. N. com (2008). World economy on thin ice
Miyazaki, K., & Kobayashi, S. (2001). Rationality of - U.N.: The United Nations blames dire situation on the
Reward Sharing in Multi-agent Reinforcement Learning. decline of the U.S. housing and financial sectors. Retrieved
New Generation Computing, 91, 157–172. doi:10.1007/ December 2008, from http://money.cnn.com/2008/05 /15/
BF03037252 news/ international/global_economy.ap/.
Miyazaki, K., & Kobayashi, S. (2003). An Extension of Montero, J., Gomez, D., & Bustine, H. (2007). On the
Profit Sharing to Partially Observable Markov Decision relevance of some families of fuzzy sets. Fuzzy Sets and
Processes: Proposition of PS-r* and its Evaluation. [in Systems, 16, 2429–2442. doi:10.1016/j.fss.2007.04.021
Japanese]. Journal of the Japanese Society for Artificial
Moody’s. Economy.com (2008). Case-Shiller® Home
Intelligence, 18(5), 286–296. doi:10.1527/tjsai.18.286
Price Index forecasts. Moody’s Analytics, Inc. Retrieved
Miyazaki, K., & Kobayashi, S. (1998). Learning Deter- December 2008, from http://www.economy.com/home/
ministic Policies in Partially Observable Markov Decision products/case_shiller_indexes.asp.
Processes. In Proceedings of the Fifth International Con-
Moore, T., Rea, D., Mayer, L., Lewis, C., & Dobson,
ference on Intelligent Autonomous System (pp. 250-257).
D. (1994).. . Canadian Journal of Earth Sciences, 31,
Miyazaki, K., & Kobayashi, S. (2000). Reinforcement 1606–1617. doi:10.1139/e94-142
Learning for Penalty Avoiding Policy Making. In Pro-
Murphy, R. R. (2000). Introduction to AI robotics. Cam-
ceedings of the 2000 IEEE International Conference on
bridge: MIT Press.
Systems, Man and Cybernetics (pp. 206-211).
Nagata, H., Morita, S., Yoshimura, J., Nitta, T., & Tainaka,
Miyazaki, K., Yamaumra, M., & Kobayashi, S. (1994).
K. (2008). Perturbation Experiments and Fluctuation
On the Rationality of Profit Sharing in Reinforcement
Enhancement in Finite Size of Lattice Ecosystems: Un-
Learning. In Proceedings of the Third International Con-
certainty in Top-Predator Conservation. Ecological Infor-
ference on Fuzzy Logic, Neural Nets and Soft Computing
matics, 3(2), 191–201. doi:10.1016/j.ecoinf.2008.01.005
(pp. 285-288).
Nagata, T., Takimoto, M., & Kambayashi, Y. (2009).
Modigliani, F., & Miller, M. H. (1958). The Cost of Capital,
Suppressing the Total Costs of Executing Tasks Using
Corporation Finance and the Theory of Investment. The
Mobile Agents. In Proceedings of the 42nd Hawaii Inter-
American Economic Review, 48(3), 261–297.
national Conference on System Sciences, IEEE Computer
Mohr, P., Biele, G., & Heekeren, H. (2008, September). Society. CD-ROM.
Distinct neural representations of behavioral risk and
Nakamura, M., & Kurumatani, K. (1997). Formation
reward risk. Paper presented at Annual Conference on
Mechanism of Pheromone Pattern and Control of Forag-
Neuroeconomics, Park City, Utah.
ing Behavior in an Ant Colony Model. In Proceedings
Monaghan, G., & Lovis, W. (2005). Modeling Archaeo- of the Fifth International Workshop on the Synthesis and
logical Site Burial in Southern Michigan. East Lansing, Simulation of Living Systems (pp. 67 -74).
MI: Michigan State Univ. Press.
333
Compilation of References
National Association of Home Builders, The Hous- Orito, Y., Takeda, M., & Yamamoto, H. (2009). Index
ing Policy Department. (2005). The local impact of Fund Optimization Using Genetic Algorithm and Scatter
home building in a typical metropolitan area: In- Diagram Based on Coefficients of Determination. Studies
come, jobs, and taxes generated. Retrieved December in Computational Intelligence: Intelligent and Evolution-
2008, from http://www.nahb.org/fileUpload_details. ary Systems, 187, 1–11.
aspx?contentTypeID=3&contentID= 35601& subCon-
Orito, Y., & Yamamoto, H. (2007). Index Fund Optimi-
tentID=28002.
zation Using a Genetic Algorithm and a Heuristic Local
NeuroSolutionsTM (2002). The Neural Network Simula- Search Algorithm on Scatter Diagrams. In Proceedings
tion Environment. Version 3, NeuroDimensions, Inc., of 2007 IEEE Congress on Evolutionary Computation
Gainesville, FL. (pp. 2562-2568).
Ng, A. Y. Ng & Russell, S. (2000). Algorithms for Inverse Palmer, R. G., Arthur, W. B., Holland, J. H., LeBaron, B., &
Reinforcement Learning. In Proceedings of 17th Interna- Tayler, P. (1994). Artificial economic life: a simple model
tional Conference on Machine Learning (pp. 663-670). of a stock market. Physica D. Nonlinear Phenomena,
Morgan Kaufmann, San Francisco, CA. 75(1-3), 264–274. doi:10.1016/0167-2789(94)90287-9
Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy Palmer, R. G., Arthur, W. B., Holland, J. H., LeBaron,
Invariance Under Reward Transformations: Theory and B., & Tayler, P. (1994). Artificial economic life: A simple
Application to Reward Shaping. In Proceedings of the model of a stock market. Physica D. Nonlinear Phenom-
Sixteenth International Conference on Machine Learn- ena, 75, 264–274. doi:10.1016/0167-2789(94)90287-9
ing (pp. 278-287).
Patel, J. (2007). A Trust and Reputation Model For Agent-
Nielsen, M., & Chuang, I. (2000). Quantum Computa- Based Virtual Organizations. Phd thesis in the faculty of
tion and Quantum Information. Cambridge: Cambridge Engineering and Applied Science School of Electronics
University Press. and Computer Sciences University of South Hampton
January 2007.
Ninagawa, S., Yoneda, M., & Hirose, S. (1997). Cellular
Automata in Dissipative Boundary Conditions [in Japa- Paulsen, D., Huettel, S., Platt, M., & Brannon, E. (2008,
nese]. Transactions of Information Processing Society of September). Heterogeneity in risky decision making
Japan, 38(4), 927–930. in 6-to-7-year-old children. Paper presented at Annual
Conference on Neuroeconomics, Park City, Utah.
Ninagawa, S. (2005). Evolving Cellular Automata by 1/f
Noise. In Proc. the 8th European Conference on Artificial Payne, J., Bettman, J., & Johnson, E. (1993). The Adaptive
Life (ECAL2005) (pp. 453-460). Decision Maker. Cambridge University Press.
Oh, K. J., Kim, T. Y., & Min, S. (2005). Using Genetic Pearson, J., Hayden, B., Raghavachari, S., & Platt, M.
Algorithm to Support Portfolio Optimization for Index (2008) Firing rates of neurons in posterior cingulate
Fund Management . Expert Systems with Applications, cortex predict strategy-switching in a k-armed bandit task.
28, 371–379. doi:10.1016/j.eswa.2004.10.014 Paper presented at Annual Conference on Neuroeconom-
ics, Park City, Utah.
Ohkura, K., Yasuda, T., Kawamatsu, Y., Matsumura, Y.,
& Ueda, K. (2007). MBEANN: Mutation-Based Evolv- Perkins, T. J. (2002). Reinforcement Learning for POM-
ing Artificial Neural Networks. In Proceedings of the 9th DPs based on Action Values and Stochastic Optimization.
European Conference in Artificial Life (pp. 936-945). In Proceedings of the Eighteenth National Conference on
Artificial Intelligence (pp. 199-204).
334
Compilation of References
Praça, I., Ramos, C., Vale, Z., & Cordeiro, M. (2003). Resconi, G., & Jain, L. (2004). Intelligent agents. Springer
MASCEM: A Multi agent System That Simulates Com- Verlag.
petitive Electricity Markets. IEEE International confer-
Resconi, G., Klir, G. J., Harmanec, D., & St. Clair, U.
ence on Intelligent Systems (pp. 54-60).
(1996). Interpretation of various uncertainty theories us-
Preuschoff, K., Bossaerts, P., & Quartz, S. (2006). ing models of modal logic: a summary. Fuzzy Sets and
Neural Differentiation of Expected Reward and Risk in Systems, 80, 7–14. doi:10.1016/0165-0114(95)00262-6
Human Subcortical Structures. Neuron, 51(3), 381–390.
Resconi, G., Klir, G. J., & St. Clair, U. (1992). Hierar-
doi:10.1016/j.neuron.2006.06.024
chical uncertainty metatheory based upon modal logic.
Priest, G., & Tanaka, K. Paraconsistent Logic. (2004). International Journal of General Systems, 21, 23–50.
Stanford Encyclopedia of Philosophy. http://plato.stan- doi:10.1080/03081079208945051
ford.edu/entries/logic-paraconsistent.
Resconi, G., Murai, T., & Shimbo, M. (2000). Field
Principe, J., Euliano, N., & Lefebvre, C. (2000). Neural Theory and Modal Logic by Semantic field to make
and Adaptive Systems: Fundamentals through Simula- Uncertainty Emerge from Information. Interna-
tions. New York: John Wiley & Sons, Inc. tional Journal of General Systems, 29(5), 737–782.
doi:10.1080/03081070008960971
Pushkarskaya, H., Liu, X., Smithson, M., & Joseph, J.
(2008, September). Neurobiological responses in individu- Resconi, G., & Turksen, I. B. (2001). Canonical Forms
als making choices in uncertain environments: Ambiguity of Fuzzy Truthoods by Meta-Theory Based Upon Modal
and conflict. Paper presented at Annual Conference on Logic. Information Sciences, 131, 157–194. doi:10.1016/
Neuroeconomics, Park City, Utah. S0020-0255(00)00095-5
Quinn, M., & Noble, J. (2001). Modelling Animal Behav- Resconi, G., & Kovalerchuk, B. (2006). The Logic of
iour in Contests: Tactics, Information and Communication. Uncertainty with Irrational Agents In Proc. of JCIS-
In Advances in Artificial Life: Sixth European Conference 2006 Advances in Intelligent Systems Research, Taiwan.
on Artificial Life (ECAL 01), (LNAI). Atlantis Press
Raberto, M., Cincotti, S., Focardi, M., & Marchesi, M. Resconi, G., Klir, G.J., St. Clair, U., & Harmanec, D.
(2001). Agent-based simulation of a financial market. (1993). The integration of uncertainty theories. Intern. J.
Physica A, 299(1-2), 320–328. doi:10.1016/S0378- Uncertainty Fuzziness knowledge-Based Systems, 1, 1-18.
4371(01)00312-0
Reynolds, R. G., & Ali, M. (2008). Computing with the
Rahman, S., Pipattanasomporn, M., & Teklu, Y. (2007). In- Social Fabric: The Evolution of Social Intelligence within
telligent Distributed Autonomous Power System (IDAPS). a Cultural Framework. IEEE Computational Intelligence
IEEE Power Engineering Society General Meeting. Magazine, 3(1), 18–30. doi:10.1109/MCI.2007.913388
Ramchurn, S. D., Huynh, D., & Jennings, N. R. (2004). Reynolds, R. G., Ali, M., & Jayyousi, T. (2008). Mining
Trust in multiagent Systems. The Knowledge Engineering the Social Fabric of Archaic Urban Centers with Cultural
Review, 19(1), 1–25. doi:10.1017/S0269888904000116 Algorithms. IEEE Computer, 41(1), 64–72.
Rangel, A., Camerer, C., & Montague, R. (2008). A Rocha, L. M. (2004). Evolving Memory: Logical Tasks
framework for studying the neurobiology of value-based for Cellular Automata. In Proc. the 9th International
decision making. Nature Reviews. Neuroscience, 9, Conference on the Simulation and Synthesis of Living
545–556. doi:10.1038/nrn2357 Systems (ALIFE9) (pp. 256-261).
335
Compilation of References
Roli, A., & Zambonelli, F. (2002). Emergence of Macro Sato, O., Ugajin, M., Tsujimura, Y., Yamamoto, H., &
Spatial Structures in Dissipative Cellular Automata. Kambayashi, Y. (2007). Analysis of the Behaviors of
In Proc. the 5th International Conference on Cellular Multi-Robots that Implement Ant Colony Clustering
Automata for Research and Industry (ACRI2002) (pp. Using Mobile Agents. In Proceedings of the Eighth Asia
144-155). Pacific Industrial Engineering and Management System.
CD-ROM.
Roth, A. E., & Ockenfels, A. (2002). Last-minute bid-
ding and the rules for ending second-price auction: Satoh, I. (1999). A Mobile Agent-Based Framework for
evidence from Ebay and Amazon auctions on the Inter- Active Networks. In Proceedings of IEEE Systems, Man,
net. The American Economic Review, 92, 1093–1103. and Cybernetics Conference (pp. 161-168).
doi:10.1257/00028280260344632
Sauter, J., Matthews, R., Parunak, H., & Brueckner, S.
Ruspini, E. H. (1999). A new approach to clustering. (2005). Performance of digital pheromones for swarming
Information and Control, 15, 22–32. doi:10.1016/S0019- vehicle control. In Proceedings of the fourth international
9958(69)90591-9 joint conference on Autonomous agents and multiagent
systems (pp. 903-910).
Russell, S., & Norvig, P. (1995). Artificial Intelligence.
Prentice-Hall. Schlosser, A., Voss, M., & Bruckner, L. (2004). Com-
paring and evaluating metrics for reputation systems by
Rust, J., Miller, J., & Palmer, R. (1994). Characterizing
simulation. Paper presented at RAS-2004, A Workshop
effective trading strategies: Insights from a computer-
on Reputation in Agent Societies as part of 2004 IEEE/
ized double auction tournament. Journal of Economic
WIC/ACM International Joint Conference on Intelligent
Dynamics & Control, 18, 61–96. doi:10.1016/0165-
Agent Technology (IAT’04) and Web Intelligence (WI’04),
1889(94)90069-8
Beijing China, September 2004.
Rust, J., Miller, J., & Palmer, R. (1993). Behavior of trad-
Schwartz, B. (2003). The Paradox of Choice: Why More
ing automata in a computerized double auction market . In
Is Less. New York, NY: Harper Perennial.
Friedman, D., & Rust, J. (Eds.), Double Auction Markets:
Theory, Institutions, and Laboratory Evidence. Redwood Shahidehpour, M., & Alomoush, M. (2001). Restructured
City, CA: Addison Wesley. Electrical Power Systems: Operation, Trading, and Vola-
tility. Marcel Dekker Inc.
Rutledge, R., Dean, M., Caplin, A., & Glimcher, P. (2008,
September). A neural representation of reward predic- Shahidehpour, M., Yamin, H., & LI Z. (2002). Market
tion error identified using an axiomatic model. Paper Operations in Electric Power Systems: Forecasting,
presented at Annual Conference on Neuroeconomics, Scheduling, and Risk Management. Wiley-IEEE Press.
Park City, Utah.
Shannon, C., & Weaver, W. (1964). The Mathematical
Sakai, S., Nishinari, K., & Iida, S. (2006). A New Theory of Communication. The University of Illinois Press.
Stochastic Cellular Automaton Model on Traffic Flow
Sharot, T., De Martino, B., & Dolan, R. (2008, September)
and Its Jamming Phase Transition. Journal of Physics.
Choice shapes, and reflects, expected hedonic outcome.
A, Mathematical and General, 39(50),15327–15339.
Paper presented at Annual Conference on Neuroeconom-
doi:10.1088/0305-4470/39/50/002
ics, Park City, Utah.
Samanez Larkin, G., Kuhnen, C., & Knutson, B. (2008).
Sharpe, W. F. (1964). Capital Asset Prices: A Theory of
Financial decision making across the adult life span. Pa-
Market Equilibrium under condition of Risk. The Journal
per presented at Annual Conference on Neuroeconomics,
of Finance, 19, 425–442. doi:10.2307/2977928
Park City, Utah.
336
Compilation of References
Sheely, T. (1995). The Wisdom of the Hive: The Social Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learn-
Physiology of Honey Bee Colonies. Harvard University ing Without State-Estimation in Partially Observable
Press. Markovian Decision Processes. In Proceedings of the
Eleventh International Conference on Machine Learning
Shiller, R. J. (2000). Irrational Exuberance. Princeton
(pp. 284-292).
University Press.
Slovic, P. (1995). The construction of preference. The
Shleifer, A. (2000). Inefficient Markets. Oxford University
American Psychologist, 50, 364–371. doi:10.1037/0003-
Press. doi:10.1093/0198292279.001.0001
066X.50.5.364
Shott, M. (1999). Cranbrook Institute of Science . Bul-
Smith, V. (1976). Experimental economics: induced value
letin, 64, 71–82.
theory. The American Economic Review, 66(2), 274–279.
Shrestha, G. B., Song, K., & Goel, L. K. (2000). An Ef-
Sole, R., Bonabeau, E., Delgado, J., Fernandez, P., &
ficient Power Pool Simulator for the Study of Competi-
Marin, J. (2000). Pattern Formation and Optimization
tive Power Market. Power Engineering Society Winter
in Army Raids. [The MIT Press.]. Artificial Life, 6(3),
Meeting.
219–226. doi:10.1162/106454600568843
Simon, H. (1955). A behavioral model of rational choice.
Sornette, D. (2003). Why stock markets crash. Princeton
The Quarterly Journal of Economics, 69, 99–118.
University Press.
doi:10.2307/1884852
Standard & Poor’s. (2008a). S&P/Case-Shiller® Home
Simon, H. (1956). Rational choice and the structure of
Price Indices Methodology. Standard & Poor’s. Retrieved
the environment. Psychological Review, 63, 129–138.
December 2008, from http://www2.standardandpoors.
doi:10.1037/h0042769
com/spf/pdf/index/SP_CS_Home_ Price_Indices_ Meth-
Simon, H. (1965). The architecture of complexity. General odology_Web.pdf.
Systems, 10, 63–76.
Standard & Poor’s. (2008b). S&P/Case-Shiller Home Price
Simon, H. (1981). Studying human intelligence by creating Indices. Retrieved December 2008, from http://www2.
artificial intelligence. American Scientist, 69, 300–309. standardandpoors.com/ portal/site/sp/en/us/page.topic/
indices_csmahp/ 2,3,4,0,0,0,0,0,0,1,1,0,0,0,0,0.html.
Simon, H. (1996). The Sciences of the Artificial. Cam-
bridge, MA: MIT Press. Stanley, K., & Miikkulainen, R. (2002). Evolv-
ing neural networks through augmenting topolo-
Simon, H. A. (1997). Behavioral economics and bounded
gies . Evolutionary Computation, 10(2), 99–127.
rationality . In Simon, H. A. (Ed.), Models of Bounded
doi:10.1162/106365602320169811
Rationality (pp. 267–298). MIT Press.
Stark, T. (2008). Survey of professional forecasters: May
Simon, H. (2005). Darwinism, altruism and econom-
13, 2008. Federal Reserve Bank of Philadelphia. Retrieved
ics. In: K. Dopfer (Ed.), The Evolutionary Foundations
December 2008, from http://www.philadelphiafed.org/
of Economics (89-104), Cambridge, UK: Cambridge
files/spf/survq208.html
University Press.
Stolze, J., & Suter, D. (2004). Quantum Computing.
Singh, S. P., & Sutton, R. S. (1996). Reinforcement Learn-
Wiley-VCH. doi:10.1002/9783527617760
ing with Replacing Eligibility Traces. Machine Learning,
22(1-3), 123–158. doi:10.1007/BF00114726
337
Compilation of References
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Suzuki, K., & Ohuchi, A. (1997). Reorganization of
Reinforcement Learning for RoboCup Soccer Agents with Pheromone Style Communication in Mulltiple
Keepaway. Adaptive Behavior, 13(3), 165–188. Monkey Banana Problem. In . Proceedings of Intelligent
doi:10.1177/105971230501300301 Autonomous Systems, 5, 615–622.
Stone, P., & Sutton, R. S. (2002). Keepaway Soccer: a Takahashi, H., & Terano, T. (2003). Agent-Based Ap-
machine learning testbed . In Birk, A., Coradeschi, S., & proach to Investors’ Behavior and Asset Price Fluctuation
Tadokoro, S. (Eds.), RoboCup-2001: Robot Soccer World in Financial Markets. Journal of Artificial Societies and
Cup V (pp. 214–223). doi:10.1007/3-540-45603-1_22 Social Simulation, 6(3).
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. (2006). Takahashi, H., & Terano, T. (2004). Analysis of Micro-
Keepaway Soccer: From Machine Learning Testbed to Macro Structure of Financial Markets via Agent-Based
Benchmark . In Noda, I., Jacoff, A., Bredenfeld, A., & Model: Risk Management and Dynamics of Asset Pricing.
Takahashi, Y. (Eds.), RoboCup-2005: Robot Soccer World Electronics and Communications in Japan, 87(7), 38–48.
Cup IX. Berlin: Springer Verlag. doi:10.1007/11780519_9
Takahashi, H., Takahashi, S., & Terano, T. (2007). Ana-
Streichert, F., & Tanaka-Yamawaki, M. (2006). The Effect lyzing the Influences of Passive Investment Strategies
of Local Search on the Constrained Portfolio Selection on Financial Markets via Agent-Based Modeling . In
Problem. In Proceedings of 2006 IEEE Congress on Edmonds, B., Hernandez, C., & Troutzsch, K. G. (Eds.),
Evolutionary Computation (pp. 2368-2374). Social Simulation- Technologies, Advances, and New
Discoveries (pp. 224–238). Hershey, PA: Information
Sueyoshi, T., & Tadiparthi, G. R. (2007). Agent-based
Science Reference.
approach to handle business complexity in U.S. wholesale
power trading. IEEE Transactions on Power Systems, Takahashi, H. (2010), “An Analysis of the Influence of
532–543. doi:10.1109/TPWRS.2007.894856 Fundamental Values’ Estimation Accuracy on Financial
Markets, ” Journal of Probability and Statistics, 2010.
Sun, R., & Qi, D. (2001). Rationality Assumptions and
Optimality of Co-learning, In Design and Applications Takahashi, H., & Terano, T. (2006a). Emergence of
of Intelligent Agents (LNCS 1881, pp. 61-75). Berlin/ Overconfidence Investor in Financial markets. 5th In-
Heidelberg: Springer. ternational Conference on Computational Intelligence
in Economics and Finance.
Sutton, R., & Barto, A. G. (1998). Reinforcement Learning:
An Introduction. Cambridge, MA: MIT Press. Takahashi, H., & Terano, T. (2006b). Exploring Risks of
Financial Markets through Agent-Based Modeling. In
Sutton, R. S. (1988). Learning to Predict by the Methods
Proc. SICE/ICASS 2006 (pp. 939-942).
of Temporal Differences. Machine Learning, 3, 9–44.
doi:10.1007/BF00115009 Takimoto, M., Mizuno, M., Kurio, M., & Kambayashi,
Y. (2007). Saving Energy Consumption of Multi-Robots
Sutton, R. S., & Barto, A. (1998). Reinforcement Learning:
Using Higher-Order Mobile Agents. In Proceedings of
An Introduction. Cambridge, MA: MIT Press.
the First KES International Symposium on Agent and
Sutton, R. S., McAllester, D., Singh, S. P., & Mansour, Multi-Agent Systems: Technologies and Applications
Y. (2000). Policy Gradient Methods for Reinforcement (LNAI 4496, pp. 549-558).
Learning with Function Approximation. Advances in
Neural Information Processing Systems, 12, 1057–1063.
338
Compilation of References
Taniguchi, K., Nakajima, Y., & Hashimoto, F. (2004). U.S. Bureau of Economic Analysis. (2008). Regional
A report of U-Mart experiments by human agents . In economic accounts: State personal income. Retrieved
Shiratori, R., Arai, K., & Kato, F. (Eds.), Gaming, Simula- December 2008, from http://www.bea.gov/regional/sqpi/
tions, and Society: Research Scope and Perspective (pp. default.cfm?sqtable=SQ1.
49–57). Springer.
U.S. Census Bureau. (2008a). Housing vacancies and
Terano, T., Nishida, T., Namatame, A., Tsumoto, S., home ownership. Retrieved December 2008, from http://
Ohsawa, Y., & Washio, T. (Eds.). (2001). New Frontiers www.census.gov/hhes/ www/histt10.html.
in Artificial Intelligence. Springer Verlag. doi:10.1007/3-
U.S. Census Bureau. (2008b). New residential construc-
540-45548-5
tion. Retrieved December 2008, from http://www.census.
Terano, T. (2007a). Exploring the Vast Parameter Space of gov/const/www/newresconstindex_excel.html.
Multi-Agent Based Simulation. In L. Antunes & K. Taka-
Ugajin, M., Sato, O., Tsujimura, Y., Yamamoto, H., Ta-
dama (Eds.), Proc. MABS 2006 (LNAI 4442, pp. 1-14).
kimoto, M., & Kambayashi, Y. (2007). Integrating Ant
Terano, T. (2007b). KAIZEN for Agent-Based Model- Colony Clustering Method to Multi-Robots Using Mobile
ing. In S. Takahashi, D. Sallach, & J. Rouchier (Eds.), Agents. In Proceedings of the Eigth Asia Pacific Industrial
Advancing Social Simulation -The First Congress- (pp. Engineering and Management System. CD-ROM.
1-6). Springer Verlag.
van Dinther, C. (2007). Adaptive Bidding in Single-Sided
Terano, T., Deguchi, H., & Takadama, K. (Eds.). (2003), Auctions under Uncertainty: An Agent-based Approach
Meeting the Challenge of Social Problems via Agent-Based in Market Engineering (Whitestein Series in Software
Simulation: Post Proceedings of The Second International Agent Technologies and Autonomic Computing). Basel:
Workshop on Agent-Based Approaches in Economic and Birkhäuser.
Social Complex Systems. Springer Verlag.
Vandersypen, L.M.K., Yannoni, C.S., & Chuang, I.L.
Tesfatsion, L. (2002). Agent-based computational eco- (2000). Liquid state NMR Quantum Computing.
nomics: Growing economies from the bottom up. Artificial
Vanstone, B., & Finnie, G. (2007). An empirical method-
Life, 8, 55–82. doi:10.1162/106454602753694765
ology for developing stockmarket trading systems using
Thomas, R., Kemp, A., & Lewis, C. (1973).. . Canadian artificial neural networks. Retrieved December 2008,
Journal of Earth Sciences, 10, 226–271. from http://epublications.bond.edu.au/cgi/ viewcontent.
cgi? article=1022&context=infotech_pubs.
Toyoda, Y., & Yano, F. (2004). Optimizing Movement of a
Multi-Joint Robot Arm with Existence of Obstracles Using Von-Wun Soo. (2000). Agent Negotiation under Uncer-
Multi-Purpose Genetic Algorithm. Industrial Engineering tainty and Risk In Design and Applications of Intelligent
and Management Systems, 3(1), 78–84. Agents (LNCS 1881, pp. 31-45). Berlin/Heidelberg:
Springer.
Triani, V., et al. (2007). From Solitary to Collective
Behaviours: Decision Making and Cooperation, In Pro- Wang, T., & Zhang, H. (2004). Collective Sorting with
ceedings of the 9th European Conference in Artificial Multi-Robot. In Proceedings of the First IEEE Inter-
Life (pp. 575-584). national Conference on Robotics and Biomimetics (pp.
716-720).
Tsang, E., Li, J., & Butler, J. (1998). EDDIE beats
the bookies. Int. J. Software. Practice and Experi- Warner, G., Hebda, R., & Hahn, B. (1984).. . Palaeogeog-
ence, 28, 1033–1043. doi:10.1002/(SICI)1097- raphy, Palaeoclimatology, Palaeoecology, 45, 301–345.
024X(199808)28:10<1033::AID-SPE198>3.0.CO;2-1 doi:10.1016/0031-0182(84)90010-5
339
Compilation of References
Warren, M. (1994). Stock price prediction using genetic Wooldridge, M. (2000). Reasoning about Rational Agents.
programming . In Koza, J. (Ed.), Genetic Algorithms at Cambridge, MA: The MIT Press.
Stanford 1994. Stanford, CA: Stanford Bookstore.
Wu, W., Ekaette, E., & Far, B. H. (2003). Uncertainty
Watkins, C. J. H., & Dayan, P. (1992). Techni- Management Framework for Multi-Agent System, Pro-
cal note: Q-learning . Machine Learning, 8, 55–68. ceedings of ATS http://www.enel.ucalgary.ca/People/far/
doi:10.1023/A:1022676722315 pub/papers/2003/ATS2003-06.pdf
Weber, E., Johnson, E., Milch, K., Chang, H., Brodscholl, Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). A Model
J., & Goldstein, D. (2007). Asymmetric discounting for Portfolio Selection with Order of Expected Returns.
in intertemporal choice: A query-theory account. Psy- Computers & Operations Research, 27, 409–422.
chological Science, 18, 516–523. doi:10.1111/j.1467- doi:10.1016/S0305-0548(99)00059-3
9280.2007.01932.x
Yao, X. (1999). Evolving artificial networks. Proceedings
Weber, B., Schupp, J., Reuter, M., Montag, C., Siegel, of the IEEE, 87(9), 1423–1447. doi:10.1109/5.784219
N., Dohmen, T., et al. (2008). Combining panel data
Yeh, C.-H., & Chen, S.-H. (2001). Market diversity and
and genetics: Proof of principle and first results. Paper
market efficiency: The approach based on genetic pro-
presented at Annual Conference on Neuroeconomics,
gramming. Journal of Artificial Simulation of Adaptive
Park City, Utah.
Behavior, 1(1), 147–165.
Williams, R. J. (1992). Simple Statistical Gradient Fol-
Zacharia, G., & Maes, P. (2000). Trust management through
lowing Algorithms for Connectionist Reinforcement
reputation mechanisms. Applied Artificial Intelligence Jour-
Learning. Machine Learning, 8, 229–256. doi:10.1007/
nal, 14(9), 881–908. doi:10.1080/08839510050144868
BF00992696
Zhan, W., & Friedman, D. (2007). Markups in double auc-
Wolfram, S. (2002). A New Kind of Science. Wolfram
tion markets. Journal of Economic Dynamics & Control,
Media Inc.
31, 2984–3005. doi:10.1016/j.jedc.2006.10.004
Wood, W., & Kleb, W. (2002). Extreme Programming in a
research environment . In Wells, D., & Williams, L. (Eds.),
XP/Agile Universe 2002 (pp. 89–99). doi:10.1007/3-540-
45672-4_9
340
341
Shu-Heng Chen is a professor in the Department of Economics and Director of Center of Interna-
tional Education and Exchange at the National Chengchi University. He also serves as the Director of
the AI-ECON Research Center, National Chengchi University, the editor- in-chief of the Journal of New
Mathematics and Natural Computation (World Scientific), the associate editor of the Journal of Economic
Behavior and Organization, and the editor of the Journal of Economic Interaction and Coordination. Dr.
Chen holds an M.A. degree in mathematics and a Ph. D. in Economics from the University of California
at Los Angeles. He has more than 150 publications in international journals, edited volumes and confer-
ence proceedings. He has been invited to give keynote speeches and plenary talks on many international
conferences. He is also the editor of the volume “Evolutionary Computation in Economics and Finance”
(Plysica-Verlag, 2002), “Genetic Algorithms and Genetic Programming in Computational Finance”
(Kluwer, 2002), and the co-editor of the Volume I & II of “Computational Intelligence in Economics
and Finance” (Springer-Verlag, 2002 & 2007), “Multi-Agent for Mass User Support” (Springer-Verlag,
2004), “Computational Economics: A Perspective from Computational Intelligence” (IGI publisher,
2005), and “Simulated Evolution and Learning,” Lecture Notes in Computer Science, ( LNCS 4247)
(Springer, 2006), as well as the guest editor of Special Issue on Genetic Programming, International
Journal on Knowledge Based Intelligent Engineering Systems (2008). His research interests are mainly
on the applications of computational intelligence to the agent-based computational economics and finance
as well as experimental economics. Details of Shu-Heng Chen can be found at http://www.aiecon.org/
or http://www.aiecon.org/staff/shc/E_Vita.htm.
Yasushi Kambayashi is an associate professor in the Department of Computer and Information En-
gineering at the Nippon Institute of Technology. He worked at Mitsubishi Research Institute as a staff
researcher before joining the Institute. His research interests include theory of computation, theory and
practice of programming languages, and political science. He received his PhD in Engineering from
the University of Toledo, his MS in Computer Science from the University of Washington, and his BA
in Law from Keio University. He is a committee member of IARIA International Multi-Conference on
Computing in the Global Information Technology and IARIA International Conference on Advances in
P2P Systems, a review committee member of Peer-to-Peer Networking and Applications, and a member
of Tau Beta Pi, ACM, IEEE Computer Society, IPSJ, JSSST, IEICE System Society, IADIS, and Japan
Flutist Association.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
About the Contributors
Hiroshi Sato is Assistant Professor of Department of Computer Science at National Defense Acad-
emy in Japan. He was previously Research Associate at Department of Mathematics and Information
Sciences at Osaka Prefecture University in Japan. He holds the degrees of Physics from Keio Univer-
sity in Japan, and Master and Doctor of Engineering from Tokyo Institute of Technology in Japan. His
research interests include agent-based simulation, evolutionary computation, and artificial intelligence.
He is a member of Japanese Society for Artificial Intelligence, and Society for Economic Science with
Heterogeneous Interacting Agents.
***
Akira Namatame is a Professor of Dept. of Computer Science, National Defense Academy of Japan.
He holds the degrees of Engineering in Applied Physics from National Defense Academy, Master of
Science in Operations Research and Ph.D, in Engineering-Economic System from Stanford University.
His research interests include Multi-agents, Game Theory, Evolution and Learning, Complex Networks,
Economic Sciences with Interaction Agents and A Science of Collectives.
Chung-Ching Tai received his Ph.D. degree in Economics from National Chengchi University,
Taiwan, R.O.C. in 2008. He conducted his post-doctoral studies in AI-ECON Research Center, National
Chengchi University, under Dr. Shu-Heng Chen from 2008 to 2009. He is currently an assistant profes-
sor in the Department of Economics at Tunghai University, Taiwan, R.O.C.
Dipti Srinivasan obtained her M.Eng. and Ph.D. degrees in Electrical Engineering from the National
University of Singapore (NUS) in1991and1994, respectively. She worked at the University of California
at Berkeley’s Computer Science Division as a post-doctoral researcher from 1994 to1995. In June 1995,
she joined the faculty of the Electrical and Computer Engineering department at the National Univer-
sity of Singapore, where she is an associate professor. From 1998 to 1999, she was a visiting faculty in
the Department of Electrical and Computer Engineering at the Indian Institute of Science, Bangalore,
India. Her main areas of interest are neural networks, evolutionary computation, intelligent multi-agent
systems and application of computational intelligence techniques to engineering optimization, plan-
ning and control problems in intelligent transportation systems and power systems. Dipti Srinivasan
is a senior member of IEEE and a member of IES, Singapore. She has published over 160 technical
papers in international refereed journals and conferences. She currently serves as an associate editor of
IEEE Transactions on Neural Networks, a social editor of IEEE Transactions Intelligent Transportation
Systems, area editor of International Journal of Uncertainty, Fuzziness and Knowledge-based Systems,
and as a managing guest editor of Neurocomputing.
342
About the Contributors
Farshad Fotouhi received his Ph.D. in computer science from Michigan State University in 1988. €He
joined the faculty of Computer Science at Wayne State University in August 1988 where he is currently
Professor and Chair of the department. Dr. Fotouhi’s major areas of research include xml databases,
semantic web, multimedia systems, and biocomputing. € He has published over 100 papers in refereed
journals and conference proceedings, served as program committee member of various database related
conferences. Dr. Fotouhi is on the Editorial Boards of the IEEE Multimedia Magazine and the Interna-
tional Journal on Semantic Web and Information Systems and he serves as a member of the Steering
Committee of the IEEE Transactions on Multimedia.
Guy Meadows has been a faculty member at the University of Michigan since 1977. His areas of
research include; field and analytical studies of marine environmental hydrodynamics with emphasis
on mathematical modeling of nearshore waves, currents and shoreline evolution, active microwave
remote sensing of ocean dynamics including wave/wave, wave/current, wave/topographic interactions
with recent work in signatures of surface ship wakes and naturally occurring ocean surface processes
and the development of in situ and remote oceanographic instrumentation and data acquisition systems
designed to measure the spatial and temporal structure of coastal boundary layer flows.
Hafiz Farooq Ahmad is an Associate Professor at School of Electrical Engineering and Computer
Science (SEECS), NUST Islamabad Pakistan and also has joint appointment as Consultant Engineer in
DTS Inc, Tokyo, Japan. He received PhD from Tokyo Institute of Technology in 2002 under the supervi-
sion of Prof. Kinji Mori. His main research topics are autonomous decentralized systems, multi-agent
systems, autonomous semantic grid and semantic web. He is a member of IEEE.
Hidemi Yamachi is an assistant professor in the Department of Computer and Information Engi-
neering from the Nippon Institute of Technology, Japan. His research interests include optimization
methods based on evolutional computation and visualization. He received his PhD from Tokyo Met-
ropolitan University.
Hidenori Kawamura is an associate professor in the graduate school of Information Science and
Technology,€ Hokkaido University, Japan. His research interests include information science, complex
systems and multi-agent systems. He recieved his PhD, MS and BA in Information Engineering from
Hokkaido University. Contact him at Graduate school of Infoarmation Science and Technology, Hokkaido
University, Noth14 West9, Sapporo, Hokkaido, 060-0814, Japan; [email protected]
Hiroki Suguri is Professor of Information Systems at School of Project Design, Miyagi University,
where he teaches systems design, object-oriented modeling, Java programming and information literacy.
Professor Suguri received his Ph.D. in software information systems from Iwate Prefectural University
343
About the Contributors
in 2004. His research interest includes multi-agent systems, semantic grid/cloud infrastructure, man-
agement information systems, and computer-aided education of information literacy.
Hisashi Yamamoto is an associate professor in the Department of System Design at the Tokyo
Metropolitan University. He received a BS degree, a MS degree, and a Dr. Eng. in Industrial Engineer-
ing from Tokyo Institute of Technology, Japan. His research interests include reliability engineering,
operations research, and applied statistics. He received best paper awards from REAJ and IEEE Reli-
ability Society Japan Chapter.
John O’Shea is a Professor of Anthropology and Curator of Great Lakes Archaeology in the Museum
of Anthropology. He earned his Ph.D. in Prehistoric Archaeology from Cambridge University in 1978.
His research focused on the ways in which the archaeological study of funerary customs could be used
to recover information on the social organization of past cultures. O’Shea maintains active research
interests in Eastern Europe and North America. His topical interests include: tribal societies, prehis-
toric ecology and economy, spatial analysis, ethnohistory, Native North America and later European
Prehistory. His research in Native North America focuses on the late pre-contact and contact periods in
the Upper Great Lakes and the Great Plains. In Europe, his research centers on the eastern Carpathian
Basin region of Hungary, Romania and northern Yugoslavia during the later Neolithic through Bronze
Age. Most recently, he has begun a program of research focused on the study of Nineteenth Century
shipwrecks in the Great Lakes. In addition, O’Shea directs a series of programs in local archaeology,
including the Archaeology in an Urban Setting project within the City of Ann Arbor, and the Vanishing
Farmlands Survey in Washtenaw County. Within the profession, O’Shea is the editor-in-chief of the
Journal of Anthropological Archaeology (Academic Press). He has also been active in the implementa-
tion of the Native American Grave Protection and Repatriation Act (NAGPRA) and was appointed in
1998 to a six-year term on the NAGPRA Review Committee by the Secretary of the Interior.
Kazuhiro Ohkura received a PhD degree in engineering from Hokkaido University in 1997. He
is currently a professor in the graduate school of Mechanical Systems Engineering at Hiroshima Uni-
versity, Japan, and the leader of Manufacturing Systems Laboratory. His research interests include
evolutionary algorithms, reinforcement learning and multiagent systems.
Keiji Suzuki is a professor in the graduate school of Information Science and Technology, Hokkaido
University, Japan. His research interests include information science, complex systems and multi-agent
systems. He recieved his PhD, MS and BA in Precision Engineering from Hokkaido University. Contact
him at Graduate school of Infoarmation Science and Technology, Hokkaido University, Noth14 West9,
Sapporo, Hokkaido, 060-0814, Japan; [email protected]
344
About the Contributors
Mak Kaboudan is full professor of statistics in the School of Business, University of Redlands.
Mak has an MS (1978) and a Ph. D. (1980) in Economics from West Virginia University. Before join-
ing Redlands in 2001, he was tenured associate professor with Penn State, Smeal College of Business.
Prior to joining Penn State, he worked as a management consultant for five years. His consulting work
is mostly in economic and business planning as well as energy and macro-economic modeling. Mak’s
current research interests are focused on forecasting business, financial, and economic conditions using
statistical and artificial intelligence modeling techniques. His work is published in many academic jour-
nals such as the Journal of Forecasting, Journal of Real Estate Literature, Journal of Applied Statistics,
Computational Economics, Journal of Economic Dynamics and Control, Computers and Operations
Research, and Journal of Geographical Systems.
Masanori Goka received a PhD degree in engineering from Kobe University in 2007. He is currently
a researcher in the Hyogo Prefectural Institute of Technology, Japan. His research includes multiagent
systems, emergence systems and embodied cognition.
Masao Kubo was graduated from precision engineering department, Hokkaido University, in 1991.
He received his Ph.D. degree in computer Science from the Hokkaido University in 1996. He had been
the research assistant of chaotic engineering Lab, Hokkaido university (1996-1999). He was the lecturer
of Robotics lab, dep. of computer science, National Defense Academy, Japan. He was the visiting research
fellow of Intelligent Autonomous Lab, university of the west of England (2003-2005). Now, he is the as-
sociate professor of Information system lab, dep. of computer science, National Defense Academy, Japan.
Munehiro Takimoto is an assistant professor in the Department of Information Sciences from Tokyo
University of Science, Japan. His research interests include design and implementation of programming
languages. He received his BS, MS and PhD in Engineering from Keio University.
Azzam ul Asar completed his BSc in Electrical Engineering and MSc in Electrical Power Engi-
neering from NWFP UET Peshawar in 1979 and 1987 respectively. He completed his PhD in Artificial
Neural Network from University of Strathclyde, Glasgow, UK in 1994 followed by post doctorate in
Intelligent Systems in 2005 from New Jersey Institute of Technology, Newark, USA. He also served as
a visiting Professor at New Jersey Institute of Technology, USA from June 2004 to June 2005. Currently
he is acting as the chair of IEEE Peshawar Subsection and IEEE Power Engineering joint chapter. He
is Dean faculty of Engineering and Technology Peshawar NWFP since November 2008.
R. Suzuki received his Ph.D. degree from Nagoya University in 2003. He is now an associate pro-
fessor in the graduate school of information science, Nagoya University. His main research fields are
artificial life and evolutionary computation. Especially he is investigating how evolutionary processes
can be affected by ecological factors such as lifetime learning (phenotypic plasticity), niche construc-
tion, and network structures of interactions.
Ren-Jie Zeng is an assistant research fellow at Taiwan Institute of Economic Research starting from
2008 to the present. His current research is the macroeconomic and industrial studies of the Chinese
Economy. He holds an M.A. degree in Economics from National Chengchi University in Taiwan.
345
About the Contributors
Robert G. Reynolds received his Ph.D. degree in Computer Science, specializing in Artificial Intel-
ligence, in 1979 from the University of Michigan, Ann Arbor. He is currently a professor of Computer
Science and director of the Artificial Intelligence Laboratory at Wayne State University. He is an Adjunct
Associate Research Scientist with the Museum of Anthropology at the University of Michigan-Ann
Arbor. He is also affiliated with the Complex Systems Group at the University of Michigan-Ann Arbor
and is a participant in the UM-WSU IGERT program on Incentive-Based Design. His interests are in
the development of computational models of cultural evolution for use in the simulation of complex
organizations and in computer gaming applications. Dr. Reynolds produced a framework, Cultural
Algorithms, in which to express and computationally test various theories of social evolution using
multi-agent simulation models. He has applied these techniques to problems concerning the origins of
the state in the Valley of Oaxaca, Mexico, the emergence of prehistoric urban centers, the origins of
language and culture, and the disappearance of the Ancient Anazazi in Southwestern Colorado using
game programming techniques. He has co-authored three books; Flocks of the Wamani (1989, Academic
Press), with Joyce Marcus and Kent V. Flannery; The Acquisition of Software Engineering Knowledge
(2003, Academic Press), with George Cowan; and Excavations at San Jose Mogote 1: The Household
Archaeology with Kent Flannery and Joyce Marcus (2005, Museum of Anthropology-University of
Michigan Press). Dr. Reynolds has received funding from both government and industry to support his
work. He has published over 250 papers on the evolution of social intelligence in journals, book chapters,
and conference proceedings. He is currently an associate editor for the IEEE Transactions on Compu-
tational Intelligence in Games, IEEE Transactions on Evolutionary Computation, International Journal
of Swarm Intelligence Research, International Journal of Artificial Intelligence Tools, International
Journal of Computational and Mathematical Organization Theory, International Journal of Software
Engineering and Knowledge Engineering, and the Journal of Semantic Computing.
Saba Mahmood is a PhD student at School of Electrical Engineering and Computer Science, NUST
Islamabad Pakistan. Her MS research area was Reputation Systems for Open Multiagent Systems. Her
PhD research is about Formalism of Trust in Dynamic Architectures. She served as the lecturer in the
department of computer science at the Institute of Management Sciences Peshawar from 2004-2007.
She is an active member of IEEE and remained academic chair of the IEEE Peshawar subsection.
Sachiyo Arai received the B.S degree in electrical engineering, the M.S. degree in control engineer-
ing and cognitive science, and Ph.D degree in artificial intelligence, from Tokyo Institute of Technology
in 1998. She worked at Sony Corporation for 2 years after receiving the B.S degree. After receiving
the Ph.D degree, she spent a year as a research associate in Tokyo Institute of Technology, and worked
as a Postdoctoral Fellows at the Robotics Institute in Carnegie Mellon University 1999-2001, a visiting
Associate Professor at the department of Social Informatics in Kyoto University 2001-2003. Currently,
an Associate Professor of Urban Environment Systems, Faculty of Engineering, Chiba University.
Shu G. Wang is an associate professor in the Department of Economics and also serves as the As-
sociate Director of the AI-ECON Research Center, National Chengchi University. Dr. Wang holds a
Ph. D. in Economics from Purdue University. His research interests are mainly on microeconomics,
institutional economics, law and economics and recently in agent-based computational economics and
experimental economics.
346
About the Contributors
T. Arita received his B.S. and Ph.D. degrees from the University of Tokyo in 1983 and 1988. He is
now a professor in the graduate school of information science at Nagoya University. His research interest
is in artificial life, in particular in the following areas: evolution of language, evolution of cooperation,
interaction between evolution and learning, and swarm intelligence.
T. Logenthiran obtained his B.Eng. degree in the department of Electrical and Electronic Engineer-
ing, University of Peradeniya, Sri Lanka. He is currently pursuing Ph.D. degree in the department of
Electrical and Computer Engineering, National University of Singapore. His main areas of interest are
distributed power system and, application of intelligent multi-agent systems and computational intel-
ligence techniques to power engineering optimization.
Tina Yu received the M.Sc. Degree in Computer Science from Northeastern University, Boston, MA
in 1989. Between 1990 and 1995, she was a Member of Technical Staff at Bell-Atlantic (NYNEX) Sci-
ence and Technology, White Plains, NY. She went to University College London in 1996 and completed
her PhD in 1999. Between August 1999 and September 2005, she was with Math Modeling team at
Chevron Information Technology Company, San Ramon, CA. She joined the Department of Computer
Science, Memorial University of Newfoundland in October of 2005.
Tzai-Der Wang is a researcher who majors in Artificial Life, Evolutionary Computation, Genetic
Algorithms, Artificial Immune Systems, and Estimation of Distribution Algorithms. He was the program
chair in CIEF2007, a program co-chair in SEAL2006. He worked in AIECON, National Chengchi Uni-
versity, Taiwan ROC, as a post-doctor researcher in 2008 and is an assistant professor in department of
industrial engineering and management, Cheng Shiu University, Taiwan ROC currently. He also holds
international research relationship with Applied Computational Intelligence Research Unit of University
of the West of Scotland, Scotland and works with other researchers together. He was supervised by
Professor Colin Fyfe and graduated from University of Paisley in 2002 with the PhD in the evolution
of cooperations in artificial communities.
Xiangdong Che received his Bachelosr degree in Electrical Engineering and an M.S. in Computer
Science from Zhejiang University, China. He received his PhD in Computer Science in Wayne State
University in 2009. He has been working as a computer engineer for 16 years. He is currently working
for Wayne State University Computing and Information Technology Division. His research interests
primarily focus on Cultural Algorithms, Socially Motivated Learning, Evolutionary Computation,
Complex Systems, optimization, intelligent agents, and multi-agent simulation systems.
347
About the Contributors
Y. Iwase received a B.S. degree from Toyama University in 2005 and a M.S. degree from Nagoya
University in 2007. Now, he is a Ph.D. student in the Graduate School of Information Science at Nagoya
University. His research interests include cellular automata, evolutionary computation and artificial life.
Especially, he investigates cellular automata interacting with the external environment.
Yousof Gawasmeh is an Ph.D. student in computer science at the Wayne State University. He
received his M.S. degree in computer science from New York Institute of Technology. He also holds
B.S. degrees from the Yarmouk University. He is currently a Gradute Teacher Assistant in computer
science at the Wayne State University. He was working as a lecturer in Phildelphia university-Jordan
in Software Engineering Department. Yousof is interested in artificial intelligent systems that have to
operate in games domains. Most of his research centers around techniques for learning and planning
of teams of agents to act intelligently in their environments. He is concentrating on implementing the
Cultural Algorithm in multi-agent syatems games to organise the agents and direct them toward the
optimal solution. One of his applications is the landbridge game.
Yukiko Orito is a lecturer in the Graduate School of Social Sciences (Economics) at the Hiroshima
University. Her research interests are the analysis of combinatorial optimization problems in financial
research. She received Dr. Eng., MS and BA in Production and Information Engineering from the Tokyo
Metropolitan Institute of Technology.
Kazuteru Miyazaki is an Associate professor at the Department of Assessment and Research for
Degree Awarding, National Institution for Academic Degrees and University Evaluation. His other
accomplishments include:1996- Assistant Professor, Tokyo Institute of Technology, 1998- Research
Associate, Tokyo Institute of Technology, 1999- Associate Professor, National Institution for Academic
Degrees and 2000- Associate Professor, National Institution for Academic Degrees and University
Evaluation. Miyazaki’s main works include : A Reinforcement Learning System for Penalty Avoiding
in Continuous State Spaces, Journal of Advanced Computational Intelligence and Intelligent Informat-
ics, Vol.11, No.6, pp.668-676, 2007 with S. Kobayashi and Development of a reinforcement learning
system to play Othello, Artificial Life and Robotics, Vol.7, No.4, pp.177-181, 2004 with S. Tsuboi, and S.
Kobayashi. Miyazaki is also a member of: The Japanese Society for Artificial Intelligence (JSAI), The
Society of Instrument and Control Engineers (SICE), Information Processing Society of Japan (IPSJ),
The Japan Society of Mechanical Engineers (JSME) The Robot Society of Japan (RSJ), and Japanese
Association of Higher Education Research.
348
349
Index
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Index
co-evolution 78, 81, 82, 86, 90, 91, 92, 94 evolving autonomous robots 157
cognitive capacity 96, 97, 99, 100, 104, 105, evolving CTRNN (eCTRNN) 157, 158, 159,
108, 111, 112, 113, 116 160, 161, 162, 163, 164, 165, 166, 171
collective behaviour 156, 159, 166, 172 experience manager 253, 254
communication networks 174, 175, 188 Exploitation-oriented Learning (XoL) 267,
component object model (COM) 225, 226 268, 277, 282, 283
computer simulations 134
conflicting agents 50, 55, 76 F
conflict structure 269, 286, 287, 289 financial economics research 134
constant absolute risk aversion (CARA) 40 financial market analyses 134
constant relative risk aversion (CRRA) 40 financial markets 118, 134, 135, 136, 137, 138,
continuing task 232, 233, 235, 238, 243, 244, 139, 141, 143, 146, 149, 151, 153, 154
245 financial markets, efficiency of 134, 141
continuous time recurrent neural networks FIRE 249, 251, 255, 257, 260, 262, 263, 264
(CTRNN) 156, 157, 172 first order conflict (FOC) 54
coordination between agents 220 follower agents 19, 20, 22, 23, 24, 25, 26, 27,
corporate finance 134 30, 33
credit assignment 235 foraging agents 194, 201
Cultural Algorithms 195, 201, 202, 207 Friedman, Milton 118, 119
CurrentTime 237 functional magnetic resonance imaging (fMRI)
38, 40, 45
D
fusion 50, 51, 54, 67, 68, 70, 71, 76
decentralized pervasive systems 309, 310, 311, fuzzy logic 50, 51
320
decision making rules, micro-level 134, 135 G
designing the reward function 233 generalized feedforward networks (GFF) 6
disobedient follower 23, 27 genetic algorithms (GA) 20, 21, 30, 31, 100,
dissipative cellular automata 311 157, 312, 313, 314, 319
distributed energy resource 208, 230 genetic programming (GP) 1, 2, 4, 5, 6, 7, 8, 9,
distributed generation (DG) 210 11, 12, 13, 15, 16, 17, 18, 79, 81, 82, 83,
dopaminergic reward prediction error (DRPE) 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
41 95, 97, 99, 100, 101, 102, 103, 104, 105,
double auction (DA) 79, 80, 81, 82, 84, 92, 93 106, 107, 108, 109, 110, 111, 112, 113,
Dynamic Programming (DP) 267, 268 115, 116, 117
Gjerstad-Dickhaut (GD) 98, 101
E
global configurations 315, 317, 319
Early Archaic occupation 195 goal-agent set 273, 274, 289, 290
Easley-Ledyard (EL) 102 Godderich region 194
eBay 250 Grid environment 250
efficient market hypothesis (EMH) 119
emergent behaviors 311, 317, 319 H
evolutionary agent design 306 Hedonic psychology 38
evolutionary computation 300 heterogeneous agent models (HAM) 121
evolutionary robotics (ER) 156, 172 heterogeneous cell states 319
evolving artificial neural network (EANN) high-density pheromones 299
156, 157, 160, 171
350
Index
351
Index
352
Index
sentence span test (SS) 104, 110 topology and weight evolving artificial neural
sharpe ratio 20 networks (TWEANN) 156, 157, 160
simulation agent 175, 176, 178, 179, 181 traders 118, 119, 120, 121, 124, 126, 127, 128,
Smith. Adam 79, 92 129, 130, 131, 132
social behaviors, macro-level 134, 135 traders, chartists 118, 121, 127
social insects 295, 296 traders, fundamentalists 118, 120, 121, 126
social intelligence 194, 195, 201, 202 traders, imitators 118, 119, 126, 127, 128, 129,
spatial short-term memory test (SSTM) 104, 130, 131
110 traders, rational 118, 119, 126, 127, 128, 129,
SPORAS 250 130
stability of transitions 317 trading strategies 118, 119, 120, 123, 130, 131
Standard & Poor's Case-Shiller Indexes (S&P TRAVOS 250, 251
CSI) 2, 3
stigmergy 251, 252, 296, 297, 305, 306 U
Stochastic Gradient Ascent (SGA) 277, 278, useful rational policy 268
281, 282, 283, 285 Utility Gain (UG) 251, 257, 258, 262, 263
stochastic policy 277, 279, 281, 282
stocks 118, 130, 131 V
suppression conditions 271, 287, 288, 289
Virtual Organization 251
Sussex approach 157
swarm behavior 295 W
swarm intelligence 191
swarm intelligence (SI) 251, 252 Wayne State University 193, 194
T X
time of reward assignment 235 Xtreme Programming 199
time series genetic programming (TSGP) 5, 6,
17
Z
TOPIX 26 zero-intelligence constrained (ZIC) 101, 103
zero-intelligence plus (ZIP) 98, 101
353