Multi Agent Applications

Multi-Agent Applications
with Evolutionary
Computation and
Biologically Inspired
Technologies:
Intelligent Techniques for
Ubiquity and Optimization
Shu-Heng Chen
National Chengchi University, Taiwan
Yasushi Kambayashi
Nippon Institute of Technology, Japan
Hiroshi Sato
National Defense Academy, Japan
Medical Information science reference

Hershey • New York
Director of Editorial Content: Kristin Klinger
Director of Book Publications: Julia Mosemann
Acquisitions Editor: Lindsay Johnston
Development Editor: Julia Mosemann
Publishing Assistant: Casey Conapitski, Travis Gundrum
Typesetter: Casey Conapitski, Travis Gundrum, Michael Brehm
Production Editor: Jamie Snavely
Cover Design: Lisa Tosheff
Published in the United States of America by

Medical Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: [email protected]
Web site: http://www.igi-global.com
Copyright © 2011 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or com-
panies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
Multi-agent applications with evolutionary computation and biologically inspired technologies : intelligent techniques for
ubiquity and optimization / Yasushi Kambayashi, editor.
p. cm.
Includes bibliographical references and index. Summary: "This book compiles numerous ongoing projects and research
efforts in the design of agents in light of recent development in neurocognitive science and quantum physics, providing readers
with interdisciplinary applications of multi-agents systems, ranging from economics to engineering"-- Provided by publisher.
ISBN 978-1-60566-898-7 (hardcover) -- ISBN 978-1-60566-899-4 (ebook) 1. Multiagent systems. 2. Evolutionary computation.
I. Kambayashi, Yasushi, 1958- QA76.76.I58M78 2010
006.3'2--dc22
2010011642
British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
Table of Contents
Preface . ............................................................................................................................................... xvi
Acknowledgment.............................................................................................................................xxviii
Section 1
Multi-Agent Financial Decision Systems
Chapter 1
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index................................... 1
Mak Kaboudan, University of Redlands, USA
Chapter 2
An Agent-Based Model for Portfolio Optimizations Using Search Space Splitting............................. 19
Yukiko Orito, Hiroshima University, Japan
Yasushi Kambayashi, Nippon Institute of Technology, Japan
Yasuhiro Tsujimura, Nippon Institute of Technology, Japan
Hisashi Yamamoto, Tokyo Metropolitan University, Japan
Section 2
Neuro-Inspired Agents
Chapter 3
Neuroeconomics: A Viewpoint from Agent-Based Computational Economics.................................... 35
Shu-Heng Chen, National Chengchi University, Taiwan
Shu G. Wang, National Chengchi University, Taiwan
Chapter 4
Agents in Quantum and Neural Uncertainty.......................................................................................... 50
Germano Resconi, Catholic University Brescia, Italy
Boris Kovalerchuk, Central Washington University, USA
Section 3
Bio-Inspired Agent-Based Artificial Markets
Chapter 5
Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-Based
Double Auction Markets........................................................................................................................ 78
Ren-Jie Zeng, Taiwan Institute of Economic Research, Taiwan
Tina Yu, Memorial University of Newfoundland, Canada
Chapter 6
Social Simulation with Both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior............................................................ 95
Chung-Ching Tai, Tunghai University, Taiwan
Tzai-Der Wang, Cheng Shiu University, Taiwan
Chapter 7
Evolution of Agents in a Simple Artificial Market.............................................................................. 118
Hiroshi Sato, National Defense Academy, Japan
Masao Kubo, National Defense Academy, Japan
Akira Namatame, National Defense Academy, Japan
Chapter 8
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets . .................. 134
Hiroshi Takahashi, Keio University, Japan
Takao Terano, Tokyo Institute of Technology, Japan
Section 4
Multi-Agent Robotics
Chapter 9
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks................... 156
Masanori Goka, Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura, Hiroshima University, Japan
Chapter 10
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering........................................ 174
Hidemi Yamachi, Nippon Institute of Technology, Japan
Munehiro Takimoto, Tokyo University of Science, Japan
Section 5
Multi-Agent Games and Simulations
Chapter 11
The AGILE Design of Reality Games AI............................................................................................ 193
Robert G. Reynolds, Wayne State University, USA
John O’Shea, University of Michigan-Ann Arbor, USA
Xiangdong Che, Wayne State University, USA
Yousof Gawasmeh, Wayne State University, USA
Guy Meadows, University of Michigan-Ann Arbor, USA
Farshad Fotouhi, Wayne State University, USA
Chapter 12
Management of Distributed Energy Resources Using Intelligent Multi-Agent System...................... 208
Thillainathan Logenthiran, National University of Singapore, Singapore
Dipti Srinivasan, National University of Singapore, Singapore
Section 6
Multi-Agent Learning
Chapter 13
Effects of Shaping a Reward on Multiagent Reinforcement Learning................................................ 232
Sachiyo Arai, Chiba University, Japan
Chapter 14
Swarm Intelligence Based Reputation Model for Open Multiagent Systems..................................... 248
Saba Mahmood, School of Electrical Engineering and Computer Science
(NUST-SEECS), Pakistan
Azzam ul Asar, Department of Electrical and Electronics Eng NWFP
University of Engineering and Technology, Pakistan
Hiroki Suguri, Miyagi University, Japan
Hafiz Farooq Ahmad, School of Electrical Engineering and Computer Science
Chapter 15
Exploitation-Oriented Learning XoL - A New Approach to Machine Learning
Based on Trial-and-Error Searches...................................................................................................... 267
Kazuteru Miyazaki, National Institution for Academic Degrees
and University Evaluation, Japan
Section 7
Miscellaneous
Chapter 16
Pheromone-Style Communication for Swarm Intelligence................................................................. 294
Hidenori Kawamura, Hokkaido University, Japan
Keiji Suzuki, Hokkaido University, Japan
Chapter 17
Evolutionary Search for Cellular Automata with Self-Organizing Properties
toward Controlling Decentralized Pervasive Systems......................................................................... 308
Yusuke Iwase, Nagoya University, Japan
Reiji Suzuki, Nagoya University, Japan
Takaya Arita, Nagoya University, Japan
Compilation of References................................................................................................................ 321
About the Contributors..................................................................................................................... 341
Index.................................................................................................................................................... 349
Detailed Table of Contents
Preface . ............................................................................................................................................... xvi
Acknowledgment.............................................................................................................................xxviii
Section 1
Multi-Agent Financial Decision Systems
Chapter 1
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index................................... 1
Mak Kaboudan, University of Redlands, USA
Successful decision-making by home-owners, lending institutions, and real estate developers among
others is dependent on obtaining reasonable forecasts of residential home prices. For decades, home-
price forecasts were produced by agents utilizing academically well-established statistical models. In
this chapter, several modeling agents will compete and cooperate to produce a single forecast. A cooper-
ative multi-agent system (MAS) is developed and used to obtain monthly forecasts (April 2008 through
March 2010) of the S&P/Case-Shiller home price index for Los Angeles, CA (LXXR). Monthly hous-
ing market demand and supply variables including conventional 30-year fixed real mortgage rate, real
personal income, cash out loans, homes for sale, change in housing inventory, and construction material
price index are used to find different independent models that explain percentage change in LXXR. An
agent then combines the forecasts obtained from the different models to obtain a final prediction.
Chapter 2
An Agent-Based Model for Portfolio Optimizations Using Search Space Splitting............................. 19
Yukiko Orito, Hiroshima University, Japan
Hisashi Yamamoto, Tokyo Metropolitan University, Japan
Portfolio optimization is the determination of the weights of assets to be included in a portfolio in order
to achieve the investment objective. It can be viewed as a tight combinatorial optimization problem that
has many solutions near the optimal solution in a narrow solution space. In order to solve such a tight
problem, this chapter introduces an Agent-based Model. The authors employ the Information Ratio, a
well-known measure of the performance of actively managed portfolios, as an objective function. This
agent has one portfolio, the Information Ratio and its character as a set of properties. The evolution of
agent properties splits the search space into a lot of small spaces. In a population of one small space,
there is one leader agent and several follower agents. As the processing of the populations progresses,
the agent properties change by the interaction between the leader and the follower, and when the itera-
tion is over, the authors obtain one leader who has the highest Information Ratio.
Section 2
Chapter 3
Neuroeconomics: A Viewpoint from Agent-Based Computational Economics.................................... 35
Recently, the relation between neuroeconomics and agent-based computational economics (ACE) has
become an issue concerning the agent-based economics community. Neuroeconomics can interest
agent-based economists when they are inquiring for the foundation or the principle of the software-
agent design, normally known as agent engineering. It has been shown in many studies that the design
of software agents is non-trivial and can determine what will emerge from the bottom. Therefore, it
has been quested for rather a period regarding whether we can sensibly design these software agents,
including both the choice of software agent models, such as reinforcement learning, and the parameter
setting associated with the chosen model, such as risk attitude. This chapter starts a formal inquiry by
focusing on examining the models and parameters used to build software agents.
Chapter 4
Agents in Quantum and Neural Uncertainty.......................................................................................... 50
Germano Resconi, Catholic University Brescia, Italy
Boris Kovalerchuk, Central Washington University, USA
This chapter models quantum and neural uncertainty using a concept of the Agent–based Uncertainty
Theory (AUT). The AUT is based on complex fusion of crisp (non-fuzzy) conflicting judgments of
agents. It provides a uniform representation and an operational empirical interpretation for several un-
certainty theories such as rough set theory, fuzzy sets theory, evidence theory, and probability theory.
The AUT models conflicting evaluations that are fused in the same evaluation context. This agent ap-
proach gives also a novel definition of the quantum uncertainty and quantum computations for quantum
gates that are realized by unitary transformations of the state. In the AUT approach, unitary matrices
are interpreted as logic operations in logic computations. The authors show that by using permutation
operators any type of complex classical logic expression can be generated. With the quantum gate, the
authors introduce classical logic into the quantum domain. This chapter connects the intrinsic irratio-
nality of the quantum system and the non-classical quantum logic with the agents. The authors argue
that AUT can help to find meaning for quantum superposition of non-consistent states. Next, this chap-
ter shows that the neural fusion at the synapse can be modeled by the AUT in the same fashion. The
neuron is modeled as an operator that transforms classical logic expressions into many-valued logic
expressions. The motivation for such neural network is to provide high flexibility and logic adaptation
of the brain model.
Section 3
Bio-Inspired Agent-Based Artificial Markets
Chapter 5
Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-Based
Double Auction Markets........................................................................................................................ 78
Ren-Jie Zeng, Taiwan Institute of Economic Research, Taiwan
Tina Yu, Memorial University of Newfoundland, Canada
This chapter investigates the dynamics of trader behaviors using an agent-based genetic programming
system to simulate double-auction markets. The objective of this study is two-fold. First, the authors
seek to evaluate how, if any, the difference in trader rationality/intelligence influences trading behav-
ior. Second, besides rationality, they also analyze how, if any, the co-evolution between two learnable
traders impacts their trading behaviors. The authors have found that traders with different degrees of
rationality may exhibit different behavior depending on the type of market they are in. When the market
has a profit zone to explore, the more intelligent trader demonstrates more intelligent behaviors. Also,
when the market has two learnable buyers, their co-evolution produced more profitable transactions
than when there was only one learnable buyer in the market. The authors have analyzed the trading
strategies and found the learning behaviors are very similar to humans in decision-making. They plan
to conduct human subject experiments to validate these results in the near future.
Chapter 6
Social Simulation with Both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior............................................................ 95
Chung-Ching Tai, Tunghai University, Taiwan
Tzai-Der Wang, Cheng Shiu University, Taiwan
This chapter presents agent-based simulations as well as human experiments in double auction markets.
The authors’ idea is to investigate the learning capabilities of human traders by studying learning agents
constructed by Genetic Programming (GP), and the latter can further serve as a design platform in
conducting human experiments. By manipulating the population size of GP traders, the authors attempt
to characterize the innate heterogeneity in human being’s intellectual abilities. They find that GP trad-
ers are efficient in the sense that they can beat other trading strategies even with very limited learning
capacity. A series of human experiments and multi-agent simulations are conducted and compared for
an examination at the end of this chapter.
Chapter 7
Evolution of Agents in a Simple Artificial Market.............................................................................. 118
Hiroshi Sato, National Defense Academy, Japan
Masao Kubo, National Defense Academy, Japan
Akira Namatame, National Defense Academy, Japan
This chapter conducts a comparative study of various traders following different trading strategies. The
authors design an agent-based artificial stock market consisting of two opposing types of traders: “ra-
tional traders” (or “fundamentalists”) and “imitators” (or “chartists”). Rational traders trade by trying to
optimize their short-term income. On the other hand, imitators trade by copying the majority behavior
of rational traders. The authors obtain the wealth distribution for different fractions of rational traders
and imitators. When rational traders are in the minority, they can come to dominate imitators in terms
of accumulated wealth. On the other hand, when rational traders are in the majority and imitators are
in the minority, imitators can come to dominate rational traders in terms of accumulated wealth. The
authors show that survival in a finance market is a kind of minority game in behavioral types, rational
traders and imitators. The coexistence of rational traders and imitators in different combinations may
explain the market’s complex behavior as well as the success or failure of various trading strategies.
Chapter 8
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets . .................. 134
Hiroshi Takahashi, Keio University, Japan
Takao Terano, Tokyo Institute of Technology, Japan
This chapter describes advances of agent-based models to financial market analyses based on the au-
thors’ recent research. The authors have developed several agent-based models to analyze microscopic
and macroscopic links between investor behaviors and price fluctuations in a financial market. The
models are characterized by the methodology that analyzes the relations among micro-level decision
making rules of the agents and macro-level social behaviors via computer simulations. In this chap-
ter, the authors report the outline of recent results of their analysis. From the extensive analyses, they
have found that (1) investors’ overconfidence behaviors plays various roles in a financial market, (2)
overconfident investors emerge in a bottom-up fashion in the market, (3) they contribute to the efficient
trades in the market, which adequately reflects fundamental values, (4) the passive investment strategy
is valid in a realistic efficient market, however, it could have bad influences such as instability of mar-
ket and inadequate asset pricing deviations, and (5) under certain assumptions, the passive investment
strategy and active investment strategy could coexist in a financial market.
Section 4
Chapter 9
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks................... 156
Masanori Goka, Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura, Hiroshima University, Japan
Artificial evolution has been considered as a promising approach for coordinating the controller of
an autonomous mobile robot. However, it is not yet established whether artificial evolution is also ef-
fective in generating collective behaviour in a multi-robot system (MRS). In this study, two types of
evolving artificial neural networks are utilized in an MRS. The first is the evolving continuous time re-
current neural network, which is used in the most conventional method, and the second is the topology
and weight evolving artificial neural networks, which is used in the noble method. Several computer
simulations are conducted in order to examine how the artificial evolution can be used to coordinate the
collective behaviour in an MRS.
Chapter 10
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering........................................ 174
Hidemi Yamachi, Nippon Institute of Technology, Japan
Munehiro Takimoto, Tokyo University of Science, Japan
This chapter presents a framework using novel methods for controlling mobile multiple robots directed
by mobile agents on a communication networks. Instead of physical movement of multiple robots,
mobile software agents migrate from one robot to another so that the robots more efficiently complete
their task. In some applications, it is desirable that multiple robots draw themselves together automati-
cally. In order to avoid excessive energy consumption, the authors employ mobile software agents to
locate robots scattered in a field, and cause them to autonomously determine their moving behaviors
by using a clustering algorithm based on the Ant Colony Optimization (ACO) method. ACO is the
swarm-intelligence-based method that exploits artificial stigmergy for the solution of combinatorial
optimization problems. Preliminary experiments have provided a favorable result. Even though there
is much room to improve the collaboration of multiple agents and ACO, the current results suggest a
promising direction for the design of control mechanisms for multi-robot systems. This chapter focuses
on the implementation of the controlling mechanism of the multi-robot system using mobile agents.
Section 5
Multi-Agent Games and Simulations
Chapter 11
The AGILE Design of Reality Games AI............................................................................................ 193
Robert G. Reynolds, Wayne State University, USA
John O’Shea, University of Michigan-Ann Arbor, USA
Xiangdong Che, Wayne State University, USA
Yousof Gawasmeh, Wayne State University, USA
Guy Meadows, University of Michigan-Ann Arbor, USA
Farshad Fotouhi, Wayne State University, USA
This chapter investigates the use of agile program design techniques within an online game develop-
ment laboratory setting. The proposed game concerns the prediction of early Paleo-Indian hunting sites
in ancient North America along a now submerged land bridge that extended between Canada and the
United States across what is now Lake Huron. While the survey of the submerged land bridge was be-
ing conducted, the online class was developing a computer game that would allow scientists to predict
where sites might be located on the landscape. Crucial to this was the ability to add in gradually dif-
ferent levels of cognitive and decision-making capabilities for the agents. The authors argue that the
online component of the courses was critical to supporting an agile approach here. The results of the
study indeed provided a fusion of both survey and strategic information that suggest that movement of
caribou was asymmetric over the landscape. Therefore, the actual positioning of human artifacts such
as hunting blinds was designed to exploit caribou migration in the fall, as is observed today.
Chapter 12
Management of Distributed Energy Resources Using Intelligent Multi-Agent System...................... 208
Thillainathan Logenthiran, National University of Singapore, Singapore
Dipti Srinivasan, National University of Singapore, Singapore
The technology of intelligent Multi-Agent System (MAS) has radically altered the way in which com-
plex, distributed, open systems are conceptualized. This chapter presents the application of multi-agent
technology to design and deployment of a distributed, cross platform, secure multi-agent framework
to model a restructured energy market, where multi players dynamically interact with each other to
achieve mutually satisfying outcomes. Apart from the security implementations, some of the best prac-
tices in Artificial Intelligence (AI) techniques were employed in the agent oriented programming to
deliver customized, powerful, intelligent, distributed application software which simulates the new
restructured energy market. The AI algorithm implemented as a rule-based system yielded accurate
market outcomes.
Section 6
Chapter 13
Effects of Shaping a Reward on Multiagent Reinforcement Learning................................................ 232
Sachiyo Arai, Chiba University, Japan
The multiagent reinforcement learnig approach is now widely applied to cause agents to behave ratio-
nally in a multiagent system. However, due to the complex interactions in a multiagent domain, it is
difficult to decide the each agent’s fair share of the reward for contributing to the goal achievement.
This chapter reviews a reward shaping problem that defines when and what amount of reward should
be given to agents. The author employs keepaway soccer as a typical multiagent continuing task that
requires skilled collaboration between the agents. Shaping the reward structure for this domain is diffi-
cult for the following reasons: i) a continuing task such as keepaway soccer has no explicit goal, and so
it is hard to determine when a reward should be given to the agents, ii) in such a multiagent cooperative
task, it is difficult to fairly share the reward for each agent’s contribution. Through experiments, this
chapter finds that reward shaping has a major effect on an agent’s behavior.
Chapter 14
Swarm Intelligence Based Reputation Model for Open Multiagent Systems..................................... 248
Saba Mahmood, School of Electrical Engineering and Computer Science
Azzam ul Asar, Department of Electrical and Electronics Eng NWFP
University of Engineering and Technology, Pakistan
Hiroki Suguri, Miyagi University, Japan
Hafiz Farooq Ahmad, School of Electrical Engineering and Computer Science
In open multiagent systems, individual components act in an autonomous and uncertain manner, thus
making it difficult for the participating agents to interact with one another in a reliable environment.
Trust models have been devised that can create level of certainty for the interacting agents. However,
trust requires reputation information that basically incorporates an agent’s former behaviour. There
are two aspects of a reputation model i.e. reputation creation and its distribution. Dissemination of
this reputation information in highly dynamic environment is an issue and needs attention for a better
approach. The authors have proposed a swarm intelligence based mechanism whose self-organizing
behaviour not only provides an efficient way of reputation distribution but also involves various sources
of information to compute the reputation value of the participating agents. They have evaluated their
system with the help of a simulation showing utility gain of agents utilizing swarm based reputation
system.
Chapter 15
Exploitation-Oriented Learning XoL - A New Approach to Machine Learning
Based on Trial-and-Error Searches...................................................................................................... 267
Kazuteru Miyazaki, National Institution for Academic Degrees
and University Evaluation, Japan
Exploitation-oriented Learning XoL is a new framework of reinforcement learning. XoL aims to learn
a rational policy whose expected reward per an action is larger than zero, and does not require a so-
phisticated design of the value of a reward signal. In this chapter, as examples of learning systems that
belongs in XoL, the authors introduce the rationality theorem of profit Sharing (PS), the rationality the-
orem of reward sharing in multi-agent PS, and PS-r*. XoL has several features. (1) Though traditional
RL systems require appropriate reward and penalty values, XoL only requires an order of importance
among them. (2) XoL can learn more quickly since it traces successful experiences very strongly. (3)
XoL may be unsuitable for pursuing an optimal policy. The optimal policy can be acquired by the multi-
start method that needs to reset all memories to get a better policy. (4) XoL is effective on the classes
beyond MDPs, since it is a Bellman-free method that does not depend on DP. The authors show several
numerical examples to confirm these features.
Section 7
Miscellaneous
Chapter 16
Pheromone-Style Communication for Swarm Intelligence................................................................. 294
Hidenori Kawamura, Hokkaido University, Japan
Keiji Suzuki, Hokkaido University, Japan
Pheromones are the important chemical substances for social insects to realize cooperative collective
behavior. The most famous example of pheromone-based behavior is foraging. Real ants use phero-
mone trail to inform each other where food source exists and they effectively reach and forage the food.
This sophisticated but simple communication method is useful to design artificial multiagent systems.
In this chapter, the evolutionary pheromone communication is proposed on a competitive ant environ-
ment model, and the authors show two patterns of pheromone communication emerged through co-
evolutionary process by genetic algorithm. In addition, such communication patterns are investigated
with Shannon’s entropy.
Chapter 17
toward Controlling Decentralized Pervasive Systems......................................................................... 308
Yusuke Iwase, Nagoya University, Japan
Reiji Suzuki, Nagoya University, Japan
Takaya Arita, Nagoya University, Japan
Cellular Automata (CAs) have been investigated extensively as abstract models of the decentralized
systems composed of autonomous entities characterized by local interactions. However, it is poorly
understood how CAs can interact with their external environment, which would be useful for imple-
menting decentralized pervasive systems that consist of billions of components (nodes, sensors, etc.)
distributed in our everyday environments. This chapter focuses on the emergent properties of CAs
induced by external perturbations toward controlling decentralized pervasive systems. The authors as-
sumed a minimum task in which a CA has to change its global state drastically after every occurrence
of a perturbation period. In the perturbation period, each cell state is modified by using an external rule
with a small probability. By conducting evolutionary searches for rules of CAs, the uathors obtained in-
teresting behaviors of CAs in which their global state cyclically transited among different stable states
in either ascending or descending order. The self-organizing behaviors are due to the clusters of cell
states that dynamically grow through occurrences of perturbation periods. These results imply that the
global behaviors of decentralized systems can be dynamically controlled by states of randomly selected
components only.
Compilation of References................................................................................................................ 321
About the Contributors..................................................................................................................... 341
Index.................................................................................................................................................... 349
xvi
Preface
ABSTRACT
From a historical viewpoint, the development of multi-agent systems demonstrates how computer sci-
ence has become more social, and how the social sciences have become more computational. With this
development of cross-fertilization, our understanding of multi-agent systems may become partial if we
only focus on computer science or only focus on the social sciences. This book with its 17 chapters
intends to give a balanced sketch of the research frontiers of multi-agent systems. We trace the origins
of the idea, a biologically-inspired approach to multi-agent systems, to John von Neumann, and then
continue his legacy in this volume.
1. GENERAL BACKGROUND
Multi-agent system (MAS) is now an independent, but highly interdisciplinary, scientific subject. It
offers scientists a new research paradigm to study the existing complex natural systems, to understand
the underlying mechanisms by simulating them, and to gain the inspiration to design artificial systems
that can solve highly complex (difficult) problems or can create commercial value. From a historical
viewpoint, the development of multi-agent systems itself demonstrates how computer science has be-
come more social, and, in the meantime, how the social sciences have become more computational.
With this development of cross-fertilization, our understanding of multi-agent systems may become
partial if we only focus on computer science or only focus on the social sciences. A balanced view is
therefore desirable and becomes the main pursuit of this editing volume. In this volume, we attempt to
give a balanced sketch of the research frontiers of multi-agent systems, ranging from computer science
to the social sciences.
While there are many intellectual origins of the MAS, the book “Theory of Self-Reproducing Au-
tomata” by von Neumann (1903-1957) certainly contributes to a significant part of the later development
of MAS (von Neumann, 1966). In particular, it contributes to a special class of MAS, called cellular
automata, which motivates a number of pioneering applications of MAS to the social sciences in the
early 1970s (Albin, 1975). In this book, von Neumann suggested that an appropriate principle for de-
signing artificial automata can be productively inspired by the study of natural automata. Von Neumann
himself spent a great deal of time on the comparative study of the nervous systems or the brain (the
natural automata) and the digital computer (the artificial automata). In his book “The Computer and the
xvii
Brain”, von Neumann demonstrates the effect of interaction between the study of natural automata and
the design of artificial automata.
This biologically-inspired principle has been further extended by Arthur Burks, John Holland and
many others. By following this legacy, this volume has this biologically-inspired approach to multi-agent
systems as its focus. The difference is that we are now richly endowed with more natural observations
for inspirations, from evolutionary biology, and neuroscience, to ethology and entomology. The main
purpose of this book is to ground the design of multi-agent systems in biologically-inspired tools, such
as evolutionary computation, artificial neural networks, reinforcement learning, swarm intelligence,
stigmergic optimization, ant colony optimization, and ant colony clustering.
Given the two well-articulated goals above, this volume covers six subjects, which of course are not
exhaustive but are sufficiently representative of the current important developments of MAS and, in
the meantime, point to the directions for the future. The six subjects are multi-agent financial decision
systems (Chapters 1-2), neuro-inspired agents (Chapters 3-4), bio-inspired agent-based financial markets
(Chapters 5-8), multi-agent robots (Chapters 9-10), multi-agent games and simulation (Chapters 11-12),
and multi-agent learning (Chapters 13-15). 15 contributions to this volume are grouped by these subjects
into six section of the volume. In addition to these six sections, a “miscellaneous” sectiont is added to
include two contributions, each of which addresses an important dimension of the development of MAS.
In the following, we would like to give a brief introduction to each of these six subjects.
2. MULTI-AGENT FINANCIAL SYSTEMS
We start with the multi-agent financial system. The idea of using multi-agent systems to process infor-
mation has a long tradition in economics, even though in early days the term MAS did not even exist.
In this regard, Hayek (1945) is an influential work. Hayek considered the market and the associated
price mechanism as a way of pooling or aggregating the market participants’ limited knowledge of the
economy. While the information owned by each market participant is imperfect, the pool of them can
generate prices with any efficient allocation of resources. The assertion of this article was later on coined
as the Hayek Hypothesis by Vernon Smith (Smith 1982) in his double auction market experiments. The
intensive study of the Hayek hypothesis in experimental economics has further motivated or strengthened
the idea of prediction markets. A prediction market essentially generates an artificial market environ-
ment such that forecasts of crowds can be pooled so as to generate better forecasts. Predicting election
outcomes via what is known as political future markets becomes one of the most prominent applications.
On the other hand, econometricians tend to pool the forecasts made by different forecasting models
so as to improve their forecasting performance. In one literature, this is known as the combined forecasts
(Clement 1989). Like prediction markets, combined forecasts tend to enhance the forecast accuracy.
The difference between prediction markets and combined forecasts is that agents in the former case are
heterogeneous in both data (the information acquired) and models (the way to process information),
whereas agents in the latter case are heterogeneous in models only. Hybrid systems in machine learning
or artificial intelligence can be regarded as a further extension of the combined forecasts, for example,
Kooths, Mitze, and Ringhut (2004). Their difference lies in the way they integrate the intelligence of the
crowd. Integration in the case of a combined forecast is much simpler, most of the time, consisting of just
the weighted combination of forecasts made by different agents. This type of integration can function well
xviii
because the market price under certain circumstances is just this simple linear combination of a pool of
forecasts. This latter property has been shown by the recent agent-based financial markets. Nevertheless,
the hybrid system is more sophisticated in terms of its integration. It is not just the horizontal combina-
tion of the pool, but also involves the vertical integration of it. In this way, heterogeneous agents do not
just behave independently, but work together as a team (Mumford and Jain, 2009).
Chapter 1 “A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index” authored
by Mak Kaboudan provides an illustration of the hybrid systems. He provides an agent-based forecasting
system of real estate. The system is composed of three types of agents, namely, artificial neural net-
works, genetic programming and linear regression. The system “aggregates” the dispersed forecasts of
these agents through a competition-cooperation cyclic phase. In the competition phase, best individual
forecasting models are chosen from each type of agent. In the cooperation phase, hybrid systems (rec-
onciliatory models) are constructed by combining artificial neural networks with genetic programming,
or by combining artificial neural networks with regression models, based on the solutions of the first
phase. Finally, there is a competition again for individual models and reconciliatory models.
Chapter 2 “An Agent-based Model for Portfolio Optimization Using Search Space Splitting” authored
by Yukiko Orito, Yasushi Kambayashi, Yasuhiro Tsujimura and Hisashi Yamamoto proposes a novel ver-
sion of genetic algorithms to solve the portfolio optimization problem. Genetic algorithms are population-
based search algorithms; hence, they can naturally be considered to be an agent-based approach, if we
treat each individual in the population as an agent. In Orito et al.’s case, each agent is an investor with
a portfolio over a set of assets. However, the authors do not use the standard single-population genetic
algorithm to drive the evolutionary dynamics of the portfolios. Instead, the whole society is divided into
many sub-populations (clusters of investors), within each of which there is a leader. The interactions of
agents are determined by their associated behavioral characteristics, such as leaders, obedient followers
or disobedient followers. These clusters and behavioral characteristics can constantly change during
the evolution: new leaders with new clusters may emerge to replace the exiting ones. Like the previous
chapter, this chapter shows that the wisdom of crowds emerges from complex social dynamics rather
than just a static weighted combination.
3. NEURO-INSPIRED AGENTS
Our brain itself is a multi-agent system; therefore, it is natural to study the brain as a multi-agent system
(de Garis 2008). In this direction, MAS is applied to neuroscience. However, the other direction also
exists. One recent development in multi-agent systems is to make software agents more human like.
Various human factors, such as cognitive capacity, intelligence, personality attributes, emotion, and
cultural differences, have become new working dimensions for software agents. Since these human
factors have now been intensively studied in neuroscience with regard to their neural correlates, it is not
surprising to see that the design of autonomous agents, under this influence, will be grounded deeper
into neuroscience. Hence, the progress of neuroscience can impact the design of autonomous agents in
MAS. The next two chapters are written to feature this future.
Chapter 3 “Neuroeconomics: A Viewpoint from Agent-Based Computational Economics” by Shu-
Heng Chen and Shu G. Wang gives a review of how the recent progress in neuroeconomics may shed
light on different components of autonomous agents, including their preference formation, alternatives
valuation, choice making, risk perception, risk preferences, choice making under risk, and learning. The
xix
last part of their review covers the well-known dual system conjecture, which is now the centerpiece
of neuroeconomic theory.
Chapter 4 “Agents in Quantum and Neural Uncertainty” authored by Germano Resconi and Boris
Kovalerchuk raises a very fundamental issue: does our brain fuzzify the received signals, even when
they are presented in a crispy way? They then further inquire into the nature of uncertainty and propose
a notion of uncertainty which is neural theoretic. A two-layered neural network is proposed to be able
to transform crisp signals into multi-valued outputs (fuzzy outputs). In this way, the source of fuzziness
comes from the conflicting evaluations of the same inputs made by different neurons, to some extent,
like Minsky’s society of minds (Minsky, 1998). Using various brain image technologies, the current
study of neuroscience has already explored various neural correlates when subjects are presented with
vague, incomplete and inconsistent information. This mounting evidence may put the modal logic under
a close examination and motivate us to think about some alternatives, like dynamic logic.
4 BIO-INSPIRED AGENT-BASED ARTIFICIAL MARKETS
The third subject of this volume is bio-inspired agent-based artificial markets. Market is another natural
demonstration of multi-agent systems. In fact, over the last decade, the market mechanism has inspired
the design of MAS, known as the market-based algorithm. To some extent, it has also revolutionized the
research paradigm of artificial intelligence by motivating the distributed AI. However, in a reverse direc-
tion, MAS also provides economists with a powerful tool to explore and to test the market mechanism.
This research helps them to learn when markets may fail and hence learn how to do market designs.
Nevertheless, the function of markets is not just about the institutional design (the so-called structur-
alism); a significant number of studies of artificial markets have found that institutional design is not
behavior-free or culture-free. This behavioral awareness and cultural awareness has now also become a
research direction in experimental economics and agent-based computational economics.
The four chapters contributing to this section all adopt a behavioral approach to the study of artificial
markets. Chapter 5 “Bounded Rationality and Market Micro-Behaviors: Case Studies Based on Agent-
Based Double Auction Markets” authored by Shu-Heng Chen, Ren-Jie Zeng, Tina Yu and Shu G Wang
can be read as an example of the recent attempt to model agents with different cognitive capacities or
intelligence. It is clear that human agents are heterogeneous in their cognitive capacity (intelligence),
and the effect of this heterogeneity on their economic and social status has been found in many recent
studies ranging from psychology and sociology to economics; nevertheless, conventional agent-based
models paid little attention to this development, and in most cases agents were explicitly or implicitly
assumed to be equally smart. By using genetic programming parameterized with different population
sizes, this chapter provides a pioneering study to examine the effect of cognitive capacity on the dis-
covery of trading strategies. It is found that larger cognitive capacity can contribute to the discovery
of more complex but more profitable strategies. It is also found that different cognitive capacity may
coordinate different matches of strategies of players in a co-evolutionary fashion, while they are not
necessarily the Nash equilibria.
Chapter 6 “Social Simulation with both Human Agents and Software Agents: An Investigation into
the Impact of Cognitive Capacity on Their Learning Behavior” authored by Shu-Heng Chen, Chung-
Ching Tai, Tzai-Der Wang and Shu G Wang. This chapter can be considered to be a continuation of the
cognitive agent-based models. What differs from the previous one is that this chapter considers not only
xx
software agents with different cognitive capacity which is manipulated in the same way as in the previ-
ous chapter, but also considers human agents with different working memory capacity. A test borrowed
from psychology is employed to measure the working memory capacity of human subjects. By placing
software agents and human agents separately in a similar environment (double auction markets, in this
case) to play against the same group of opponents (Santa Fe program agents), they are able to examine
whether the economic significance of intelligence observed from human agents can be comparable to
that observed in the software agents, and hence to evaluate how well the artificial cognitive capacity has
mimicked the human cognitive capacity.
Chapter 7 “Evolution of Agents in a Simple Artificial Market” authored by Hiroshi Sato, Masao Kubo
and Akira Namatame is a work devoted to the piling-up literature on agent-based artificial stock markets.
As Chen, Chang and Du (2010) have surveyed, from the viewpoint of agent engineering, there are two
major classes of agent-based artificial stock markets. One comprises the H-type agent-based financial
models, and the other, the Santa-Fe-like agent-based financial models. The former has the agents whose
behavioral rules are known and, to some extent, are fixed and simple. The latter has the agents who are
basically autonomous, and their behavior, in general, can be quite complex. This chapter belongs to the
former, and considers two types of agents: rational investors and imitators. It uses the standard stochastic
utility function as the basis for deriving the Gibbs-Boltzmann distribution as the learning mechanism of
agents and shows the evolving microstructure (fraction) of these two types of agents and its connection
to the complex dynamics of financial markets.
Chapter 8 “Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets”
authored by Hiroshi Takahashi and Takao Terano is another contribution to agent-based artificial stock
markets. It shares some similarities with the previous chapter; mainly, they both belong to the H-type
agent-based financial markets, categorized in Chen, Chang and Du (2010). However, this chapter distin-
guishes itself by incorporating the ingredients of behavioral finance into agent-based financial models,
a research trend perceived in Chen and Liao (2004). Specifically, this chapter considers passive and
active investors, overconfident investors, and prospects-based investors (Kahneman-Tversky inves-
tors). Within this framework, the authors address two frequently-raised issues in the literature. The first
one is the issue pertaining to survival analysis: among different types of agents, who can survive, and
under what circumstances? The second issue pertains to the traceability of the fundamental prices by
the market price: how far and for how long can the market price deviate from the fundamental price.
Their results and many others in the literature seem to indicate that the inclusion of behavioral factors
can quite strongly and persistently cause the market price to deviate from the fundamental price, and
that persistent deviation can exist even after allowing agents to learn.
5 MUTLI-AGENT ROBOTICS
Section 4 comes to one of the most prominent applications of multi-agent systems, i.e., multi-agent robot-
ics. RoboCup (robotic soccer games) which was initiated in the year 1997 provides one of the exemplary
cases (Kitano, 1998). In this case, one has to build a team of agents that can play a soccer game against
a team of robotic opponents. The motivation of RoboCup is that playing soccer successfully demands
a range of different skills, such as real-time dynamic coordination using limited communication band-
width. Obviously, a formidable task in this research area is how to coordinate these autonomous agents
(robots) coherently so that a common goal can be achieved. This requires each autonomous robot to
xxi
follow a set of behavioral rules, and when they are placed in a distributed interacting environment, the
individual operation of these rules can collectively generate a desirable pattern. This issue is so basic
that it already exists in the very beginning of MAS, such as pattern formation in cellular automata. The
simple cellular automata are homogeneous in the sense that all automata follow the same set of rules,
and there is a mapping from these sets of rules to the emergent patterns. Wolfram (2002) has worked
this out in quite some detail.
Multi-robot systems can be considered to be an extension of the simple cellular automata. The issue
pursued here is an inverse engineering problem. Instead of asking what pattern emerges given a set of
rules, we are now asking what set of rules are required to generate certain kinds of patterns. This is the
coordination problem for not only multi-agent robots but also other kinds of MAS. Given the complex
structure of this problem, it is not surprising to see that evolutionary computation has been applied to
tackle this issue. In this part, we shall see two such studies.
Chapter 9 “Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks”
authored by Masanori Goka and Kazuhiro Ohkura gives a concrete coordination problem for robots.
Ten autonomous mobile robots have to push three packages to the goal line. Each of these autonomous
robots is designed with a continuous-time recurrent artificial neural network. The coordination of them
is solved using evolutionary strategies and genetic algorithms. In the former case, the network structure
is fixed and only the connection weights evolve; in the latter case, the network structure is also evolved
with the connection weights. It has been shown that in the latter case and in the later stage, the team of
robots develops a kind of autonomous specialization, which divides the entire team into three sub-teams
to take care of each of the three packages separately.
Chapter 10 “A Multi-Robot System Using Mobile Agents with Ant Colony Clustering” authored by
Yasushi Kambayashi, Yasuhiro Tusjimura, Hidemi Yamachi, and Munehiro Takimoto presents another
coordination problem of the multi-robot systems. In their case, the robots are the luggage carts used
in the airports. These carts are picked up by travelers at designated points and left in arbitrary places.
They are then collected by man one by one, which is very laborious. Therefore, an intelligent design
is concerned with how these carts can draw themselves together at designated points, and how these
gathering places are determined. The authors apply the idea of mobile agents in this study. Mobile agents
are programs that can transmit themselves across an electronic network and recommence execution at
a remote site (Cockayne and Zyda, 1998). In this chapter, mobile agents are employed as the medium
before the host computer (a simulating agent) and all these scattered carts via the device of RFID (Radio
Frequency Identification). The mobile software agent will first collect information with regard to the
initial distribution of these luggage carts, and this information will be sent back to the host computer,
which will then use ant colony clustering, an idea motivated by ant corps gathering and brood sorting
behavior, to figure out the places to which these carts should return. The designated place for each cart
is then transmitted to each cart again via the mobile software agent.
The two chapters in this part are in interesting sharp contrast. The former involves the physical
movement of robots during the coordination process, whereas the latter does not involve physical move-
ment until the coordination problem has been solved via the simulation. In addition, the former uses
the bottom-up (decentralized) approach to cope with the coordination problem, whereas the latter uses
the top-down (centralized) approach to cope with the coordination problem, even though the employed
ant colony clustering itself is decentralized in nature. It has been argued that the distributed system can
coordinate itself well, for example, in the well-known El Farol problem (Arthur, 1994). In the intelligent
transportation system, it has also been proposed that a software driver be designed that can learn and
xxii
can assist the human drives to avoid traffic routes, if these software drives can be properly coordinated
first (Sasaki, Flann and Box, 2005). Certainly, this development may continue and, after reading these
two chapters, readers may be motivated to explore more on their own.
6 MULTI-AGENT GAMES AND SIMULATION
The analysis of social group dynamics through gaming for sharing of understanding, problem solving
and education can be closely tied to MAS. This idea has been freshly demonstrated in Arai, Deguchi and
Matshi (2006). In this volume, we include two chapters contributing to gaming simulation.
Chapter 11 “Agile Design of Reality Games Online” authored by Robert Reynolds, John O’Shea,
Farshad Fotouhi, James Fogarty, Kevin Vitale and Guy Meadows is a contribution to the design of on-
line games. The authors introduce agile programming as an alternative to the conventional waterfall
model. In the waterfall model, the software development goes through a sequential process, which
demands that every phase of the project be completed before the next phase can begin. Yet, very little
communication occurs during the hand-offs between the specialized groups responsible for each phase
of development. Hence, when a waterfall project wraps, this heads-down style of programming may
create product that is not actually what the customer want. The agile-programming is then proposed as
an alternative to help software development teams react to the instability of building software through
an incremental and iterative work cycle, which is detailed in this chapter. The chapter then shows how
this incremental and iterative work cycle has been applied to develop an agent-based hunter-deer-wolf
game. In these cases, agents are individual hunters, deer, and wolves. Each of these individuals can work
on his own, but each of them also belongs to a group, a herd or a pack so that they may learn socially.
Social intelligence (swarm intelligence) can, therefore, be placed into this game; for example, agents
can learn via cultural algorithms (Reynolds, 1994, 1999). The results of these games are provided to a
group of archaeologists as an inspiration for their search for human activity evidences in ancient times.
Chapter 12 “Management of Distributed Energy Resources Using Intelligent Multi-Agent System”
authored by T Logenthiran and Dipti Srinivasan is a contribution to the young but rapidly-growing
literature on the agent-based modeling of electric power markets (Weidlich, 2008). The emergence of
this research area is closely related to the recent trend of deregulating electricity markets, which may
introduce competition to each constituent of the originally vertically-integrated industry, from generation,
transmission to distribution. Hence, not surprisingly, multi-agent systems have been applied to model
the competitive ecology of this industry. Unlike those chapters in Part III, this chapter is not direct in-
volved in the competitive behavior of buyers and sellers in the electricity markets. Instead, it provides
an in-depth description of the development of the simulation software for electricity markets. It clearly
specifies each agent, in addition to the power-generating companies and consumers, of electricity mar-
kets. Then they show how this knowledge can be functionally integrated into simulation software using
a multi-agent platform, such as JADE (Java Agent DEvelopment Framework).
xxiii
7 MULTI-AGENT LEARNING
The sixth subject of the book is about learning in the context of MAS. Since the publication of Bush
and Mosteller (1955) and Luce (1959), reinforcement learning is no longer just a subject of psychology
itself, but is proved to be important for many other disciplines, such as economics and games. Since
the seminal work of Samuel (1959) on the checkers playing program, the application of reinforcement
learning to games is already 50 years on. The recent influential work by Sutton and Barro (1998) has
further pushed these ideas so that they are being widely used in artificial intelligence and control theory.
The advancement of various brain-image technologies, such as fMRI and positron emission tomogra-
phy, has enabled us to see how our brain has the built-in mechanism required for the implementation
of reinforcement learning. The description of reinforcement learning systems actually matches the
behavior of specific neural systems in the mammalian brain. One of the most important such systems is
the dopamine system and the role that it plays in learning about rewards and directing our choices that
lead us to rewards (Dow, 2003; Montague, 2007).
However, like other multi-disciplinary development, challenging issues also exist in reinforcement
learning. A long-lasting fundamental issue is the design or the determination of the reward function, i.e.,
reward as a function of state and action. Chapter 13 “Effects of Shaping a Reward on Multiagent Rein-
forcement Learning” by Sachiyo Arai and Nobuyuki Tanaka addresses two situations which may make
reward function exceedingly difficult to design. In the first case, the task is constantly going on and it is
not clear when to reward. The second case involves the learning of team members as a whole instead of
individually. To make the team achieve its common goal, it may not be desirable to distribute the rewards
evenly among team members, but the situation can be worse if the rewards are not properly attributed
to the few deserving individuals. Arai and Tanaka address these two issues in the context of keepaway
soccer, in which a team tries to maintain ball possession by avoiding the opponent’s interceptions.
Trust has constantly been a heated issue in multi-agent systems. This is so because in many situations
agents have to decide with whom they want to interact and what strategies to use. By all means, they
want to be able to manage the risk of interacting with malicious agents. Hence, evaluating the trustwor-
thiness of “strangers” become crucial. People in daily life would be willing to invest to gain informa-
tion to deal with this uncertainty. Various social systems, such as rating agencies, social networks, etc.,
have been constructed so as to facilitate the acquiring of the reputations of agents. Chapter 14 “Swarm
Intelligence Based Reputation Model for Open Multiagent Systems” by Saba Mahmood, Assam Asar,
Hiroki Suguri and Hafiz Ahmad deals with the dissemination of updated reputations of agents. After
reviewing the existing reputation models (both centralized and decentralized ones), the authors propose
their construction using ant colony optimization.
Chapter 15 “Exploitation-oriented Learning XoL: A New Approach to Machine Learning Based on
Trial-and-Error Searches” by Kazuteru Miyazaki is also a contribution to reinforcement learning. As
we have said earlier, a fundamental challenge for reinforcement learning is the design of the reward
function. In this chapter, Miyazaki proposes a novel version of reinforcement learning based on many
of his earlier works on the rationality theorem of profit sharing. This new version, called XoL, differs
from the usual one in that reward signals only require an order of importance among the actions, which
facilitates the reward design. In addition, XoL is a Bellman-free method since it can work on the classes
beyond Markov decision processes. XoL can also learn fast because it traces successful experiences very
strongly. While the resultant solution can be biased, a cure is available through the multi-start method
proposed by the author.
xxiv
8 MISCELLANEOUS
The last part of the book has “Miscellaneous” as its title. There are two chapters in the part. While these
two chapters can be related to and re-classified into some of the previous parts, we prefer to make them
“stand out” here to not blur their unique coverage. The first one in this part (Chapter 16) is related to the
multi-agent robotics and also to multi-agent learning, but it is the only chapter devoted to the simulation
of the behavior of insects, namely, the ant war. It is an application of MAS to entomology or computa-
tional entomology, and a biologically inspired approach that is put back to the study of biology. The last
chapter of the book (Chapter 17) is devoted to cellular automata, an idea widely shared in many other
chapters of the book, but it is the only chapter which exclusively deals with this subject with an in-depth
review. As we have mentioned earlier, one of the origins of the multi-agent systems is von Neumann’s
cellular automata. It will indeed be aesthetic if the whole book has the most recent developments on
this subject as its closing chapter.
Chapter 16 “Pheromone-Style Communication for Swarm Intelligence” authored by Hidenori Kawamura
and Keiji Suzuki simulates two teams of ants competing for food. What concerns the authors is how
ants effectively communicate with their teammates so that the food collected can be maximized. In a
sense, this chapter is similar to the coordination problems observed in RoboCup. The difference is that
insects like ants or termites are cognitively even more limited than robots in RoboCup. Their decisions
and actions are rather random, which requires no memory, no prior knowledge, and does not involve
learning in an explicit way. Individually speaking, they are comparable to what is known to economists
as zero-intelligent agents (Gode and Sunder, 1993). Yet, entomologists have found that they can com-
municate well. The communication is however not necessarily direct, but more indirect, partially due to
their poor visibility. Their reliance on indirect communication has been noticed by the French biologists
Pierre-Paul Grasse (1895-1985), and he termed this style of communication or interaction stigmergy
(Grosan and Abraham, 2006)
He defined stigmergy as: “Stimulation of workers by the performance they have achieved.” Stigmergy
is a method of communication in which the individuals communicate with each another via modify-
ing their local environment. The price mechanism familiar to economists is an example of stigmergy.
It does not require market participants to have direct interaction, but only indirect interaction via price
signals. In this case the environment is characterized as the price, which is constantly changed by market
participants and hence constantly invites others to take actions further.
In this chapter, Kawamura and Suzuki use genetic algorithms to simulate the co-evolution processes of
the emergent stigmergic communication among ants. While this study is specifically placed in a context
of an ant war, it should not be hard to see its potential in a more general context, such as the evolution
of language, norms and culture.
In Chapter 17 “Evolutionary Search for Cellular Automata with Self-Organizing Properties toward
Controlling Decentralized Pervasive Systems” authored by Yusuke Iwase, Reiji Suzuki and Takaya Arita
bring us back to where we begin in this introductory chapter, namely, cellular automata. As we have
noticed, from a design perspective, the fundamental issue is an inverse engineering problem, i.e., to
find out rules of automata by which our desired patterns can emerge. This chapter basically deals with
this kind of issue but in an environment different from the conventional cellular automata. The cellular
automata are normally run in a closed system. In this chapter, the authors consider an interesting exten-
sion by exposing them to an open environment or a pervasive system. In this case, each automaton will
receive external perturbations probabilistically. These perturbations will then change the operating rules
xxv
of the interfered cells, which in turn may have global effects. Having anticipated these properties, the
authors then use genetic algorithms to search for rules that may best work with these perturbations to
achieve a given task.
The issues and simulations presented in this chapter can have applications to social dynamics. For
example, citizens interact with each other in a relatively closed system, but each citizen may travel out
once in a while. When they return, their behavioral rules may change due to cultural exchange; hence
they will have an effect on their neighbors that may even have a global impact on the social dynamics.
In this vein, the other city which hosts these guests may experience similar kinds of changes. In this
way, the two systems (cities) are coupled together. People in cultural studies may be inspired by the
simulation presented in this chapter.
7. CONCLUDING REMARKS
When computer science becomes more social and the social sciences become more computational,
publications that can facilitate the talks between the two disciplines are demanded. This edited volume
demonstrates our efforts to work this out. It is our hope that more books or edited volumes as joint ef-
forts among computer scientists and social scientists will come, and, eventually, computer science will
help social scientists to piece together their “fragmental” social sciences, and the social sciences will
constantly provide computer scientists with fresh inspiration in defining and forming their new and
creative research paradigm. The dialogue between artificial automata and natural automata will then
continue and thrive.
Shu-Heng Chen
Yasushi Kambayashi
Hiroshi Sato
REFERENCES
Albin, P. (1975). The Analysis of Complex Socioeconomic Systems. Lexington, MA: Lexington Books.
Arai, K., Deguchi, H., & Matsui, H. (2006). Agent-Based Modeling Meets Gaming Simulation. Springer.
Arthur, B. (1994). Inductive reasoning and bounded rationality. American Economic Review, 84(2),
406–411.
Bush, R.R., & Mosteller, F. (1955). Stochastic Models for Learning. New York: John Wiley & Sons.
Hayek, F. (1945). The use of knowledge in society. American Economic Review, 35(4), 519-530.
xxvi
Chen S.-H, & Liao C.-C. (2004). Behavior finance and agent-based computational finance: Toward an
integrating framework. Journal of Management and Economics, 8, 2004.
Chen S.-H, Chang C.-L, & Du Y.-R (in press). Agent-based economic models and econometrics. Knowl-
edge Engineering Review, forthcoming.
Clement, R. (1989). Combining forecasts: A review and annotated bibliography. International Journal
of Forecasting, 5, 559-583.
Cockayne, W., Zyda, M. (1998). Mobile Agents. Prentice Hall.
De Garis, H. (2008). Artificial brains: An evolved neural net module approach. In J. Fulcher & L. Jain
(Eds.), Computational Intelligence: A Compendium. Springer.
Dow, N. (2003). Reinforcement Learning Models of the Dopamine System and Their Behavior Implica-
tions. Doctoral Dissertation. Carnegie Mellon University.
Grosan, C., & Abraham, A. (2006) Stigmergic optimization: Inspiration, technologies and perspectives.
In A. Abraham, C. Gorsan, & V. Ramos (Eds.), Stigmergic Optimization (pp. 1-24). Springer.
Gode, D., & Sunder, S. (1993). Allocative efficiency of markets with zero intelligence traders: Market
as a partial substitute for individual rationality. Journal of Political Economy, 101,119-137.
Kitano, H. (Ed.) (1998) RoboCup-97: Robot Soccer World Cup I. Springer.
Kooths, S., Mitze, T., & Ringhut, E. (2004). Forecasting the EMU inflation rate: Linear econometric
versus non-linear computational models using genetic neural fuzzy systems. Advances in Econometrics,
19, 145-173.
Luce, D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley.
Minsky, M. (1988). Society of Minds. Simon & Schuster.
Montague, R. (2007). Your Brain Is (Almost) Perfect: How We Make Decisions. Plume.
Mumford, C., & Jain, L. (2009). Computational Intelligence: Collaboration, Fusion and Emergence.
Springer.
Reynolds, R. (1994). An introduction to cultural algorithms. In Proceedings of the 3rd Annual Confer-
ence on Evolutionary Programming (pp. 131-139). World Scientific Publishing.
Reynolds, R. (1999). An overview of cultural algorithms. In D. Corne, F. Glover, M. Dorigo (Eds.), New
Ideas in Optimization (pp. 367-378). McGraw Hill Press.
Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal, 3(3),
210-229.
Sasaki, Y., Flann, N., Box, P. (2005). The multi-agent games by reinforcement learning applied to on-line
optimization of traffic policy. In S.-H. Chen, L. Jain & C.-C. Tai (Eds.), Computational Economics: A
Perspective from Computational Intelligence (pp. 161-176). Idea Group Publishing.
xxvii
Smith, V. (1982). Markets as economizers of information: Experimental examination of the “Hayek

Hypothesis”. Economic Inquiry, 20(2), 165-179.
Sutton, R.S., & Barto, A.G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
von Neumann, J. (1958). The Computer and the Brain. Yale University Press.
von Neumann, J. completed by Burks A (1966). Theory of Self Reproducing Automata. Univ of Illinois
Press.
Weidlich, A. (2008). Engineering Interrelated Electricity Markets: An Agent-Based Computational Ap-
proach. Physica-Verlag.
Wolfram, S. (2002). A New Kind of Science. Wolfram Media.
xxviii
Acknowledgment
The editors would like to acknowledge the assistance of all involved in the collection and review process
of this book, without those support the project could not have been completed. We wish to thank all the
authors for their great insights and excellent contributions to this book. Thanks to the publishing team
at IGI Global, for their constant support throughout the whole process. In particular, special thanks to
Julia Mosemann for her patience in taking this project to fruition.
Shu-Heng Chen
Yasushi Kambayashi
Hiroshi Sato
Section 1
Multi-Agent Financial Decision
Systems
1
Chapter 1
A Multi-Agent System
Forecast of the S&P/Case-
Shiller LA Home Price Index
Mak Kaboudan
University of Redlands, USA
ABSTRACT
Successful decision-making by home-owners, lending institutions, and real estate developers among
others is dependent on obtaining reasonable forecasts of residential home prices. For decades, home-
price forecasts were produced by agents utilizing academically well-established statistical models. In this
chapter, several modeling agents will compete and cooperate to produce a single forecast. A cooperative
multi-agent system (MAS) is developed and used to obtain monthly forecasts (April 2008 through March
2010) of the S&P/Case-Shiller home price index for Los Angeles, CA (LXXR). Monthly housing market
demand and supply variables including conventional 30-year fixed real mortgage rate, real personal
income, cash out loans, homes for sale, change in housing inventory, and construction material price
index are used to find different independent models that explain percentage change in LXXR. An agent
then combines the forecasts obtained from the different models to obtain a final prediction.
INTRODUCTION ing related to housing sales have kept the U.S.

economy growing and have prevented a double-dip
The economic impacts of temporal changes in recession since 2001.” Earlier, Case et al. (2000)
residential home prices are well documented. investigated the effects of the real estate prices on
Changes in home prices play a significant role in the U.S. economic cycles. The economic benefits
determining homeowners’ abilities to borrow and of home building were measured in a study by the
spend and therefore general economic conditions. National Association of Home Builders (2005).
Case and Shiller (2003, p. 304) state: “There can They maintain that home building generates lo-
be no doubt that the housing market and spend- cal income and jobs for residents and increases
governments’ revenues. The economic impacts
DOI: 10.4018/978-1-60566-898-7.ch001 of home-price changes are not unique to the U.S.
Copyright © 2011, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Multi-Agent System Forecast of the S&P/Case-Shiller LA Home Price Index
housing market. For example, Ludwig and Torsten Chicago, Denver, Las Vegas, Los Angeles, Miami,
(2001) quantify the impact of changes in home New York, San Diego, San Francisco, and Wash-
prices on consumption in 16 OECD countries, and ington, D.C. as well as a composite index of all 10
in Australia the government expected “moderat- cities (CME, 2007). A second composite index was
ing consumption growth as wealth effects from later introduced to include twenty metropolitan
house price and share price movements stabilise” areas. Additionally, it is calculated for Atlanta,
(Commonwealth of Australia, 2001). Changes in Charlotte, Cleveland, Dallas, Detroit, Minne-
home prices in a relatively large economy (such as apolis, Phoenix, Portland, Seattle, and Tampa as
that of the U.S.) also affect economic conditions well as a composite index of all 20 MSAs. These
in others. For example, Haji (2007) discussed the are financial tools to trade U.S. real estate values
impact of the U.S. subprime mortgage crisis on the and are based on the S&P/Case-Shiller Indexes
global financial markets. Reports in the Chinese (CSIs) for all 20 cities and the two composites.
news (e.g., China Bystanders, 2008) reveal that CSIs are recognized as “the most trustworthy and
banks in China are experiencing lower profits due authoritative house price change measures avail-
to losses on trading mortgage-related securities. able” (Iacono, 2008). Case and Shiller (1989 and
On August 30, 2007, a World Economy report 1990) presented early introduction of the index,
published by the Economist stated that “subprime its importance, and forecasts.
losses are popping up from Canada to China”. On This chapter focuses on forecasting the S&P/
May 16, 2008, CNN Money.com (2008) published Case-Shiller index for Los Angeles MSA (LXXR).
a summary report of the mid-year update of the Two reasons explain why LXXR is selected.
U.N. World Economic Situation and Prospects First, modeling and predicting only one index is a
2008. The U.N. report stated that the world challenge to be addressed first before tackling 20
economy is expected to grow only 1.8% in 2008 indexes in different locations that are characteristi-
and the downturn is expected to continue with cally heterogeneous markets and before predicting
only a slightly higher growth of 2.1% in 2009. either composite. Second, the Los Angeles area
The slow growth is blamed on further deteriora- housing market is one of the hardest hit by the
tion in the U.S. housing and financial sectors that unprecedented subprime financial problems. The
is expected to “continue to be a major drag for plan is to predict monthly percentage changes in
the world economy extending into 2009.” Given LXXR for 24 months (April of 2008 through March
the economic impacts of changes in home prices, of 2010). Monthly percentage change in LXXR =
accurate predictions of future changes probably %D_LXXRt = 100*{Ln(LXXRt)-Ln(LXXRt-1)},
help project economic conditions better. where Ln = natural logarithm, and t = 1, …,
Since April 2006, forecasting home prices T months. (Hereon, %D_Xt = 100*{Ln(Xt)-
gained additional importance and therefore at- Ln(Xt-1)}.) Input data used is monthly and covers
tention after the Chicago Mercantile Exchange the period from January of 1992 through March
(CME) began trading in futures and options on of 2008. The forecast is developed in stages. In
housing. Investors can trade the CME Housing the first, variables (Xi where i = 1, …,n) sus-
futures contracts to profit in up or down housing pected of creating temporal variations in LXXR
markets or to protect themselves against market are logically and intuitively identified, data of
price fluctuations, CME Group (2007). Initially, those variables are collected, then variables that
prices of those contracts were determined ac- best explain variation in LXXR (Xj where j = 1,
cording to indexes of median home prices in ten …,k and k ⊆ n) are determined using genetic
metropolitan statistical areas (MSAs): Boston, programming (GP). Variables identified as best
2
in the first stage are forecasted for 24 months in The balance of this chapter has four sections.
the second. A multi-agent system (MAS) is finally Before describing the methodology used to pro-
developed to model the monthly percent change in duce a forecast of %D_LXXR using multi-agent
LXXR (%D_LXXR). In this third stage, MAS is systems, the S&P/Case-Shiller Index is briefly
a network of computational techniques employed introduced. Estimation results are presented next
first to obtain several independent “best” fore- followed by the forecasts obtained. The final sec-
casts of %D_LXXR. By assuming that each of tion has the conclusion.
the techniques employed captures the variable’s
dynamics over history (1992-2008) at least par-
tially, a single agent then independently takes the ThE S&P/CASE-ShILLER INDEx
forecasts produced (by the techniques employed)
as input to produce a single forecast as output. The Case/Shiller indexes are designed to measure
Ideally, the best forecast should be evaluated changes in the market value of residential real
relative to others published in the literature. How- estate in each Metropolitan Statistical Area (MSA)
ever, an extensive literature search failed to find by tracking the values of single-family housing
any monthly forecast of LXXR or %D_LXXR. within the United States. It measures changes in
(Most probably this is the first study to model housing prices with homes sold held at a constant
and predict LXXR monthly.) Only independent level of quality by utilizing data of matched sale
annual changes expected in CA were found. For pairs for pre-existing homes. In short, its calcula-
example, the California Association of Realtors tion is based upon repeat sales of existing homes.
(C.A.R.) publishes an annual forecast for the en- For each MSA, a three-month moving average
tire state. Their quarterly forecast (C.A.R., 2008) is calculated. The monthly moving average is of
projects a modest price increase in the second sales pairs found for that month and the preceding
half of 2008 and in 2009. Their annual forecast two months. A Standard & Poor’s report (2008a)
(Appleton-Young, 2008) projects a decline of contains a full description of how the indexes are
about 5% for 2008. A different forecast is produced calculated. The indexes are published by Standard
by Housing Predictor (2008) who predicts that & Poor’s (2008b). Without going into details here,
home prices will decline by 12.8% in 2008. Only LXXR is designed to measure changes in the total
two forecasts of the S&P/Case-Shiller indexes value of all existing single-family housing stock
were found: Moody’s Economy.com (2008) and in the Los Angeles Metropolitan Statistical Area
Stark (2008). Moody’s forecast is of the 20-city which includes Los Angeles, Long Beach, and
composite index and is only presented graphically. Santa Ana.
It suggested that housing prices will continue to The Los Angeles MSA index (LXXR) depicts
decline through the middle of 2009 when they the sharp declines in housing prices experienced
are expected to start bottoming out. Stark (2008) 2007 and 2008. Changes in housing prices around
predicted that the composite index will decline the Los Angeles area were much stronger than
by 12% in 2008, decline by 0.3% in 2009, and most of the nation. Figure 1 shows a comparison
increase by 3.8% in 2010. Only one city forecast between the monthly percentage changes in LXXR
of CSI was found. The BostonBubble.com (2007) and the 10- and the 20-composite indexes (CSXR
published a forecast of the index for Boston MSA and SPCS20R) over the period 1992-2008. The
through 2011. They project that the Boston CSI more aggressive volatility in %D_LXXR (higher
will continue to decline until April of 2010. % increases and more pronounced % decreases)
relative to the two composite indexes is evident.
3
Figure 1.
METhODOLOGY 2008) is used. Typically, a small sample of the

data is withheld to determine the forecasting
Modeling the percentage change in LXXR (or efficacy of a model. Instead, it is assumed here
%D_LXXR) is rather challenging given the dy- that even if a model succeeds in producing an
namics portrayed in Figure 1. Seasonal patterns impressive ex post forecast (i.e., a forecast of
as well as irregular variations are evident. Given outcomes already known) there is no guarantee
such nonlinear complexity of temporal variations that it will produce a reliable ex ante forecast (i.e.,
in %D_LXXR, it is reasonable to assume that a forecast of unknown outcomes). Further, under
employing several types of modeling techniques conditions when market volatility is evident, it is
simultaneously can deliver a more reliable fore- more logical to use the most recent information (or
cast of the variable than one delivered by a single data) to obtain models than to use such informa-
technique. Each modeling technique can be viewed tion in testing them. Given that the objective is
as an autonomous agent, with one common goal to ultimately forecast LXXR for a period of two
for all: minimize each fitting model’s mean square years (or 24 month), withholding any of the data
error (MSE). It is assumed here that the best models to test the models may be at the cost of obtaining
the agents construct produce forecasts that may relatively larger forecast errors toward the end of
be successful during given periods and not in oth- the 24-month period.
ers. If this is the case, a logical modeling strategy In the multi-agent system utilized here, several
would be to use the outputs of those best models as modeling techniques or agents are employed.
inputs for others and re-estimate and forecast new Bio-inspired computational techniques (genetic
ones. Thus, a multi-agent system is constructed programming or GP and artificial neural networks
to capture interactions between the obtained best or ANN) and a statistical technique (a regression
models. The objective is to determine whether a model or RM using standard ordinary least squares
multi-agent system can produce a single forecast or OLS) are candidate estimation and prediction
that outperforms any single agent prediction. agents. An independent model will first be pro-
In constructing models to forecast %D_LXXR, duced by each technique. The combined fitted
all available data (January 1992 through March values are then the input variables into a GP, an
4
ANN, and an OLS to obtain future (predicted) maximum tree depth = 100, maximum number of
values of %D_LXXR. generations = 100, mutation rate = 0.6, crossover
Use of multi-agent systems when developing rate = 0.3, self reproduction = 0.10, and operators
models that explain dynamics of systems is not = +, -, x, /, sin, & cos.
new. Chen and Yeh (1996) used GP learning of GP-evolved equations are in the form of a parse
the cobweb model. Barto (1996) used ANN multi- tree. Trees are randomly assembled such that if
agent reinforcement learning. Chen and Tokinaga an operator is selected, the tree grows. Operators
(2006) used GP for pattern-learning. Vanstone are thus its inner nodes. A tree continues to grow
and Finnie (2007) used ANN for developing until end nodes (or terminal) contain variables or
stock market trading system. As mentioned in constant terms. Once a population of equation is
the introduction, modeling %D_LXXR to fore- assembled, a new generation is then bred using
cast LXXR is completed in three stages. Before mutation, crossover, and self reproduction. Fitter
describing each stage, brief introductions to GP equations in a population get a higher chance to
and ANN follow. participate in breeding. In mutation, a randomly
assembled sub-tree replaces a randomly selected
Genetic Programming existing part of a tree. In crossover, randomly
selected parts of two existing trees are swapped.
GP is an optimization search technique. Koza In self reproduction, a top percentage of the fittest
(1992) provides foundations of GP. Examples of individuals in one population (usually top 10%)
its use in forecasting are in Chen and Yeh (1995), are passed on to the next generation. For all bred
Tsang et al. (1998), and Warren (1994). The GP individuals, if the offspring are fitter than their
software used in this study is TSGP (for Time parents, they survive; else the parents survive.
Series Genetic Programming) written in C++ for The idea in GP is to continue generating new
Windows environment and runs on a standard populations while preserving “good genes” in a
PC (Kaboudan, 2004). TSGP is used because it Darwinian sense. After completing a specified
is designed specifically to produce regression- number of generations (100 to 200), the program
type models and to compute standard regression terminates and saves the fittest model to an output
statistics. Statistical properties of models TSGP file. Actual, fitted, forecasted values, residuals,
produces were analyzed in Kaboudan (2001). as well as evaluation statistics (R2, MSE, and
Two types of input files are needed for executing MAPE = mean absolute percent error) are written
TSGP: data files and a configuration file. Data to another output file.
input files contain values of the dependent and A GP algorithm has its characteristics. The
each of the independent variables. The configu- program randomly selects the explanatory vari-
ration file contains execution information a user ables and the coefficients. The iterative process
controls. TSGP assembles an initial population produces coincidental very strange specifications.
of individual equations (say 1000 of them) with It is based on heuristics and lacks theoretical
random specifications, computes their fitness justification. Further, during execution the com-
(MSE = mean squared error), and then breeds puterized algorithm occasionally gets trapped at
new equations as members of a new generation a local minimum MSE in the search space and
with the same population size. Each individual never reaches a global one. This necessitates
– member of a population – is a regression-like conducting a large number of searches (say 100)
model represented by a parse tree. The key run to find the 100 fittest equations. One or more of
parameters specified in the configuration file are: them should actually produce a superior fit (and
population size =1000, fitness measure = MSE, forecast) that may not be otherwise obtainable.
5
Perhaps this explains why GP-evolved equations number of epochs is increased by increments of
have strong predictive abilities. 500 until the best network is identified. The search
GP and conventional statistical regressions is repeated using networks with two hidden layers
have differences. Because equation coefficients if no reasonably acceptable output is obtained. The
are not computed (they are computer-generated configuration with the best estimation statistics is
random numbers between -128 and 127 for TSGP), then used in forecasting. The fitness parameter is
problems of multicollinearity, autocorrelation, and MSE to be consistent with GP.
heteroscedasticity are nonexistent. Further, there
are also no degrees of freedom lost when more Stage I: Determining the
explanatory variables are added. Explanatory Variables
Neural Networks Determining variables to employ in explaining the

dynamics of changes in LXXR and help predict its
Artificial neural networks (ANN) have been future variations dependents on correctly identify-
around for more than thirty years now. Principe ing possible candidate variables and availability of
et al. (2000) among many others describe how data. Identifying a logical set of these variables is
ANN is used in forecasting. The most commonly guided here by economic theory. Carefully evaluat-
used neural network structures are multilayer ing factors (in this case, measurable variables) that
perceptrons (MLP) and generalized feedforward impact the demand and supply sides of the housing
networks (GFF). MLP is a layered feedforward market furnishes a good number of variables that
network that learns nonlinear function mappings. It logically impose possible upward or downward
employs nonlinear activation functions. Networks price pressures. Rather than constructing a system
are typically trained with static backpropagation of market equilibrium that encompasses separate
and require differentiable, continuous nonlinear demand and supply functions, a system of market
activation functions such as hyperbolic tangent or disequilibrium (Fair and Jaffee, 1972) that befits
sigmoid. A network takes explanatory variables the housing market conditions is assumed. Under
as input to learn how to produce the closest fit- disequilibrium conditions, the market rarely ever
ted values of a dependent variable. Although reaches equilibrium. This means that prevailing
MLP trains slowly and requires a large number quantities traded and transaction prices may be
of observations, it seems to approximate well. anywhere on a disequilibrium trading curve where
GFF is a generalization of MLP with connections the quantity traded is the lesser of the quantities
that jump over layers. GFF also trains with static demanded and supplied. (This accommodates the
backpropagation. MLP is used in this study to nature of housing markets where there is always
produce forecasts. The forecasts were produced more homes to sell than those bought. To account
here using NeuroSolutionsTM software (2002). for what might seemingly be perpetual excess
Parameters used to complete ANN searches inventory in the housing market, an assumption
here are determined by trial and error. To obtain a is made that the market has an average inventory
forecast, first, several configurations of MLP are level; and excess supply or excess demand would
tested to identify the suitable network structure to be that number of homes for sale above or below
use. Hidden layers use hyperbolic tangent transfer that average inventory level.) Therefore, a single
function, employ 0.70 learning momentum, and price trading equation that combines demand and
train a minimum of 1000 epochs. First, searches supply determinants should be sufficient under dis-
are completed using one hidden layer. Testing equilibrium conditions. Economic theory suggests
those starts with 1000 training epochs then the including demand determinants such as income
6
and mortgage rate and supply determinants such forecasting model. Only those variables occurring
as construction cost and changes in inventory for often in the generated GP equations are reported
example in the equation. here (given that there is no obvious benefit from
The identified complete set of explanatory discussing others). The GP agent determined
variables are then Xi for i = 1, …, n possible that variations in %D_LXXR are best explained
demand and supply determinants as well as their by: COL = cash out loans or percent of amount
lagged values. Lagged explanatory variables are borrowed above the amount used to finance a
logical since the decision making process when purchase (Freddie Mac, 2008a). FS = number
buying a home tends to be rational and conditional of houses for sale in thousands (U.S. Census
upon verification of income, inspection of homes, Bureau, 2008a). SOLD = number of units sold
among other transaction-completing progression in thousands (U.S. Census Bureau, 2008a). ES =
routine that can take anywhere from three months excess supply = FSt-1 – SOLDt. CMI = construc-
to a year. To determine Xj for j = 1, …, k (that tion material index (U.S. Census Bureau, 2008b).
subset of Xi) variables to select when modeling CHI = change in housing inventory = FSt – FSt.
changes in LXXR, a single agent is employed. MR = 30-year real mortgage rate (Freddie Mac,
It is a GP regression-like model-assembler. That 2008b). LOAN = indexed loan = LXXR * LPR
agent is given all possible variables identified as (LPR = loan-to-price ratio). LAPI = Los Angeles
candidates to explain variation in %D_LXXR. real personal income (U.S. Bureau of Economic
Its task is to generate a number of promising Analysis, 2008). ESDV = excess supply dummy
models. Thus, given Xi, a GP agent will evolve a variable where ESDVt = 1 if ESt < average ESt
reasonably good number of equations (say 200) and = zero otherwise. These variables are taken at
first. The fittest of those equations (say best 10%) different lags λ = 3, …, 24 months: COLt-λ, FSt-λ,
are identified. Explanatory variables included in SOLDt-λ, ESt- λ, CMIt-λ, CHIt-λ, MRt- λ, LOANt-λ, and
these equations are then harvested to be used in LAPIt-λ. All variables measured in dollar values
re-estimating models of %D_LXXR employing were converted into 1982 real or constant dollars
GP as well as other agents. using the LA metropolitan area consumer price
The idea of selecting the best explanatory index (CPI).
variables using GP is new. The process starts with
all the identified variables and their lagged values Stage II: Predicting the
(with λ = 3, 13, …, 24 monthly lags considered Explanatory Variables
and included). The GP agent then generates models
to explain historical variations in the dependent In this stage, the forecast values of each explana-
variable %D_LXXR. Specifications of each of the tory variable (Xj) determined in the first stage are
20 best evolved models (10% of all GP models obtained employing two agents, GP and ANN.
evolved) are obtained and whatever variables they Agents responsible for obtaining forecasts of the
contain are tabulated. Any variable in an equation Xj are each given a set of appropriate variables
gets one vote regardless of the number of times (Zv) to explain variations in each previously
it appears in that same equation. This means that identified X variable. Alternatively, Xj = f(Zvλ),
the number of votes a variable gets is the number where Zvλ for v = 1, …, V explanatory variables
of GP models that it appears in, and therefore, the and λ = 3, .., L lags. Results from the two agents
maximum number of votes a variable can have are compared and a decision is made on whether
is 20. The votes are tallied and those variables to take the better one or take their average. The
repeatedly appearing in the equations are selected two agents are assumed competitive if the results
to employ when searching for the final %D_LXXR of one of them are deemed superior to the other.
7
Figure 2.
Superiority is determined according to training COLt = f(MRt-6, …, MRt-18); (1)

fitness (MSE). They are assumed cooperative if
there is no difference between their performances %D_MRt = f(%D_MRt-6, …, %D_MRt-18); (2)
and an average of them is taken. Figure 2 depicts
the flow chart of the process used to obtain predic- FSt = f(FSt-1,12, COLt-1,12, LPR t-1,12); (3)
tions for only one variable (X1).
The same process is repeated for all the other %D_SOLDt = f(%D_MRt-6,12, %D_LAPIt-1,12);
explanatory variables (X2, …, Xk). The difference (4)
is in the variables used to capture the dynamics
of each Xj and the final decision involving deter- CMIt = f(CMIt-3,12,); (5)
mining the best forecast after determining
%D_LOANt = f(%D_MRt-3,18); (6)
whether to employ a competitive or cooperative
double-agent solution selection scheme. Given
%D_LAPIt = f(%D_LAPIt-3,6, %D_CPIt-12,16,
that two of the explanatory variables are identities
%D_HWRt-12,16). (7)
(i.e., computed using other variables) and one is
a dummy variable, the number of models to esti-
In the equations above, %D_X = % change in
mate is seven rather than ten. The specifications
variable X and Xt-a,b = Xt-a, Xt-(a+1), …, Xt-b., where
of the seven models identified as best in predict-
a and b are integer measures of temporal distances.
ing the explanatory variables are:
Using lagged values helps capture the evolutionary
Table 1. Estimation statistics of fitted explanatory variables
Variable MSE R2 Agent

%D_COL 7.39 0.94 ANN
%D_MR 6.10 0.26 GP
FS 2.98 0.61 ANN
%D_SOLD 18.81 0.50 ANN
CMI 9.84 0.96 GP
%D_LOAN 1.31 0.38 ANN
%D_LAPI 0.06 0.71 ANN
8
Figure 3. Flow chart starting with X1 explanitory input variables into each modeling technique. Solu-
tions from the different models are identified by ‘S’ (GP S, ANN S, ..., etc.). Respective MSE computa-
tions determine the best model each technique produces. ANN is then used to estimate models that fit
residuals output from GP and from RM.
dynamics embedded in each variable’s historical dictions. The three techniques are GP, ANN, and
values. Two new variables were introduced in the linear regression models (RM or OLS). All three
equations above: LPR and HWR. HWR is the are multivariate and they take exogenous variables
average monthly hourly wage in Los Angeles. as inputs. As shown in the figure, the Xj variables
Estimation statistics belonging to the models feed into the model generating techniques. Each
that generated forecasts of the seven explanatory technique acts as an agent whose task is to pro-
variables (%D_COL, %D_MR, FS, %D_SOLD, duce many solutions. The best solutions provided
CMI, %D_LOAN, %D_LAPI) are in Table 1. To by the techniques then compete to determine the
produce lengthy forecasts, Equations (1) through best forecast.
(7) were used to forecast three months (the least Solutions obtained using the different tech-
lag in an equation) first. The three-month forecast niques remain competitive until all best solutions
was then used as input to solve each respective are identified. They act as cooperative agents to
equation again to obtain the next three months help deliver the best possible forecast in the final
forecast, and so on. Using this iterative process, step. Cooperation is in two ways. The first involves
a 24-month forecast of each of the explanatory fitting the residuals (= Actual – fitted) from one
variables was produced. technique using a different technique. Because
ANN produced the lowest MSE relative to GP
Stage III: Multi-Agent Modeling of %D_ and RM at the competitive level, ANN was used
LXXR to fit residuals the other two produced. The idea
assumes that whatever dynamics one technique
In this stage, several agents are employed to missed (i.e., the residuals) may be captured using
produce forecasts of %D_LXXR and LXXR. a different technique. The second cooperation
Figure 3 portrays the flow of the implementation involves using all outputs obtained from the dif-
process. Historical as well as forecasted values ferent techniques as inputs to model some type
of the explanatory variables are input to three of weight distribution between them and hope-
techniques selected to produce %D_LXXR pre- fully capture what may be the best forecast.
9
Figure 4.
10
ESTIMATION RESULTS was added to the list the final GP model contained.
The best OLS equation found (with R2 = 0.75 and
Only the final estimation results and their statistics MSE = 0.36) was as follows:
are presented here. Forecasts are presented in the
next section. Using the same input variables, an Y = 7.49 - 39.41 X4t-6 - 0.74 X3t-5 + 0.55 X2t-11
extensive search is employed until the best pos- + 3.40 Ln(X5 t-9) - 3.87 X5t-12
sible output statistics are reached independently
by each technique. ANN does not produce an (0.75) (14.3) (0.26) (0.18) (0.93) (0.77)
estimated equation and only the best GP and
RM-OLS estimated equations can be reported. - 1.47 X1t-1 + 0.42 X1 t-6. (9)
A search consisting of a total of 100 GP mod-
els was conducted. The best (lowest MSE) GP (0.17) (0.22)
equation among the 100 attempted produced the
fit shown in Figure 4 (a) and was as follows: In (9), the figures in parentheses are the es-
timated coefficients’ standard errors. These sta-
Y = cos{sin[X1t-3 * {cos(X1t-6) + X1t-3}] - X1t-6} tistics suggest that all estimated coefficients are
*[ sin[ sin(X2t-5) + {X3t-9 * sin(X2t-5 + X1t-3)} statistically different from zero at the 5% level of
] + { X3t-9 * [ {sin(X2t-5) + DVt + X3t-9} + { significance. Figure 4 (c) compares actual with
sin[X2t-5 + X1t-3 * { cos(X1t-6) + X1t-3} ] – OLS fitted values.
sin{X1t-3* [cos{sin(X2t-5) + DVt} + X1t-3] } + The results from the three techniques used sug-
X1t-6} ] }] - DVt - 2 * X4t (8) gest that ANN (with lowest MSE) should generate
the best forecast. Logically then, GP and OLS re-
where (and for aesthetic reasons), Y = %D_LXXR, siduals (Actual – Fitted) may be predictable using
X1 = CHI, X2 = %D_SOLD, X3 = COL, X4 = ANN. Thus, to capture the GP and OLS models
%D_MR, and ESDV = DV. The equation above unexplained variation in %D_LXXR, ANN was
had R2 = 0.79 and MSE = 0.298. used to model them. The resulting fitted residuals
As mentioned earlier, the ANN search in- were then added to originally obtained %D_LXXR
volved trial-and-error routine to find the best fit. fit to produce the new cooperative two-agent fits
Ultimately, the best model was produced by a and forecasts. Two ANN systems were developed
multilayered perceptron system with a layered using the exact same explanatory variables with
feedforward network trained with static back- the dependent variable being GP residuals once
propagation. The best results were obtained using and OLS residuals the second. Here are the results
a single hidden layer with five input processing from estimating the two combinations performed:
elements, a TanhAxon transfer function, used ANN estimation of the GP residuals to obtain
momentum learning rule (with step size = 1.0 GP+ANN: R2 = 0.97 MSE = 0.008
and momentum = 0.7), with 6500 epochs, and ANN estimation of the OLS residuals to obtain
was obtained after three training times. The best OLS+ANN: R2 = 0.95 MSE = 0.02
ANN fit obtained is shown in Figure 4 (b). It had Additional models were then obtained by
R2 = 0.99 and MSE = 0.006. reconciliatory cooperation. Reconciliatory coop-
A trial-and-error procedure was also used to eration entails employing all results obtained thus
find the best OLS regression model. Explanatory far as input variables to produce the final fits and
variables were added and deleted and their lags forecasts. Alternatively, three agents (GP, ANN,
varied until the best fit was found. Interestingly, and OLS) take as inputs the five solutions produced
only one extra explanatory variable (X5 = LOAN) using GP, ANN, OLS, GP+ANN, and OLS+ANN
11
Table 2. %D_LXXR forecasts obtained by independent, complementing, and reconciliatory agents
Independent Complementing Reconciliatory

ANN GP OLS GP+ANN OLS+ANN GP_R ANN_R OLS_R Ave
A-08 -3.96 -3.02 -3.29 -3.06 -4.68 -3.59 -3.44 -3.56 -3.53
M-08 -3.98 -1.09 -4.31 -1.84 -4.96 -3.08 -2.41 -3.00 -2.83
J-08 -3.66 -3.76 -3.95 -4.21 -3.70 -3.89 -3.60 -3.92 -3.81
J-08 -1.53 -2.45 -1.41 -3.10 -1.06 -2.19 -2.29 -2.25 -2.24
A-08 -0.67 -3.37 -0.26 -3.60 1.06 -1.90 -2.16 -2.02 -2.02
S-08 -2.32 -3.80 -0.49 -4.46 0.64 -3.22 -3.16 -3.32 -3.23
O-08 -3.71 -2.28 -1.30 -2.53 -0.36 -3.22 -2.42 -3.18 -2.94
N-08 -1.41 -3.04 -1.75 -2.53 -0.40 -1.88 -1.97 -1.93 -1.93
D-08 0.11 -2.96 -1.66 -1.66 -0.14 -0.63 -0.77 -0.70 -0.70
J-09 -0.89 -0.70 -0.96 0.36 -1.01 -0.37 -0.42 -0.31 -0.37
F-09 -0.79 -0.35 0.00 0.84 -1.26 -0.11 -0.23 -0.03 -0.12
M-09 0.76 -0.39 -0.04 0.40 -1.52 0.61 0.42 0.61 0.54
A-09 1.34 -0.63 -0.59 1.05 0.65 1.22 1.43 1.22 1.29
M-09 -0.27 -2.45 -1.33 -0.77 -0.55 -0.48 -0.58 -0.49 -0.52
J-09 -0.83 -0.98 -0.38 0.63 0.51 -0.22 -0.07 -0.15 -0.14
J-09 0.47 0.38 0.36 2.53 1.16 1.33 1.48 1.43 1.41
A-09 1.29 -1.16 0.48 0.65 0.93 1.02 1.33 1.01 1.12
S-09 1.19 0.10 0.02 0.89 0.20 1.07 1.10 1.07 1.08
O-09 -0.39 -0.96 -0.52 -0.06 -0.53 -0.25 -0.28 -0.23 -0.25
N-09 0.49 -0.47 -0.43 0.54 -0.51 0.51 0.48 0.53 0.51
D-09 0.86 -1.85 -0.78 0.12 -0.99 0.55 0.50 0.53 0.53
J-10 -1.02 -3.43 -1.31 -4.23 -1.54 -2.37 -3.06 -2.50 -2.64
F-10 -1.97 -2.04 -1.58 -3.19 -2.49 -2.48 -2.61 -2.53 -2.54
M-10 -1.74 -3.33 -1.51 -4.43 -2.47 -2.87 -3.41 -2.99 -3.09
modeling algorithms to produce new models and when other variables were selected. The OLS-
estimates. The final best GP- reconciliation model reconciliation model (OLS_R) also produced the
(GP_R) found is: best outcome when the same two variables were
employed. The best OLS_R model found is:
%D_LXXR = 0.4194 GPNN + 0.5806 NN
(10) %D_LXXR = 0.011 + 0.463 GPNN + 0.543
NN. (11)
where GPNN = estimated values obtained from (0.006) (0.053) (0.052)

the GP+ANN combination and NN are the values
produced using ANN alone. The equation above The estimation MSE statistics of the coopera-
has only two right-hand-side variables because tive models were almost identical: GP_R MSE
GP produced less desirable models otherwise = 0.005, OLS_R MSE = 0.004, ANN_R MSE =
12
Table 3. LXXR forecasts obtained by independent, complementing, and reconciliatory agents
Independent Complementing Reconciliatory

ANN GP OLS GP+ANN OLS+ANN GP_R ANN_R OLS_R Ave
A-08 199.06 200.95 200.40 200.40 197.65 200.31 198.63 200.75 199.90
M-08 191.29 198.77 191.96 196.75 188.08 195.54 192.60 194.81 194.32
J-08 184.42 191.43 184.52 188.63 181.25 188.62 185.25 187.31 187.06
J-08 181.62 186.81 181.94 182.87 179.34 184.35 181.24 183.14 182.91
A-08 180.42 180.61 181.47 176.41 181.24 180.42 177.84 179.48 179.24
S-08 176.27 173.88 180.59 168.71 182.41 174.81 172.20 173.62 173.54
O-08 169.85 169.96 178.25 164.50 181.76 170.62 166.75 168.20 168.52
N-08 167.47 164.87 175.17 160.39 181.03 167.29 163.64 164.99 165.31
D-08 167.65 160.05 172.29 157.74 180.78 166.01 162.61 163.84 164.15
J-09 166.16 158.94 170.63 158.32 178.97 165.31 162.02 163.34 163.56
F-09 164.85 158.40 170.63 159.65 176.73 164.93 161.84 163.29 163.36
M-09 166.11 157.78 170.57 160.29 174.06 165.62 162.83 164.28 164.25
A-09 168.35 156.78 169.57 161.98 175.19 168.01 164.82 166.30 166.38
M-09 167.89 152.99 167.32 160.75 174.24 167.04 164.04 165.49 165.52
J-09 166.51 151.49 166.67 161.76 175.12 166.92 163.69 165.25 165.29
J-09 167.29 152.08 167.28 165.91 177.16 169.40 165.88 167.64 167.64
A-09 169.46 150.32 168.08 166.99 178.81 171.67 167.58 169.34 169.53
S-09 171.50 150.47 168.11 168.49 179.18 173.58 169.38 171.17 171.38
O-09 170.84 149.03 167.24 168.39 178.24 173.09 168.96 170.78 170.94
N-09 171.68 148.34 166.52 169.30 177.33 173.92 169.83 171.69 171.81
D-09 173.17 145.61 165.22 169.50 175.57 174.78 170.77 172.60 172.72
J-10 171.41 140.70 163.08 162.48 172.90 169.52 166.77 168.34 168.21
F-10 168.07 137.87 160.52 157.38 168.64 165.15 162.69 164.13 163.99
M-10 165.17 133.35 158.11 150.56 164.53 159.62 158.09 159.30 159.00
0.008, and their average MSE = 0.005. All four prices start to increase marginally in 2009, but
had R2 = 0.99. decline again early in 2010. Prices are expected
to decrease by 7.5% in the third quarter of 2008
and decrease again by 5.56% in the fourth quarter.
FORECAST RESULTS They are expected to increase by about 3% in third
quarter of 2009, but decrease by 3.24% during the
Forecast results and their statistics are compared first quarter of 2010.
in this section. The three reconciliatory attempts Table 3 presents forecasted LXXR values
produced similar forecasts. Table 2 shows the obtained using the %D_LXXR predictions where
predicted monthly %D_LXXR produced by all the predicted LXXR values are computed as fol-
agents and the average produced by the final lows:
three reconciliation agents. Although the rate of
decline in prices is expected to decrease in 2008,
13
Figure 5.
LXXRt = exp(%D_LXXRt/100 + Ln(LXXRt-1)) of housing price indexes in the Los Angeles met-
(12) ropolitan area throughout March of 2010 shown
in Tables 2 and 3 are rather gloomy. Prices are
Given that the results of the agents are almost expected to reverse to their levels during the third
identical, Figure 5 presents the plots of predicted quarter of 2003 by the first quarter of 2010.
%D_LXXR and LXXR averages only. Predictions
14
Efficacy of the forecast produced can be CONCLUSION

evaluated by comparing the forecast obtained here
(that was available almost a year before the ac- This chapter introduced novel thought on utilizing
tual values materialized) with the actual LXXR agent-based modeling to forecast the Los Angeles
values published after this research was com- metropolitan area S&P/Case-Shiller Index for the
pleted. Of the 24-month period forecast values 24 months: April 2008 through March 2010. The
produced by the different agents as well as coop- construction of agent-based modeling was based
eration between them shown in Table 3, twelve- on progression defined in three stages. In the first,
month of actual values became available prior to what may be perceived as the best variables that
publishing this work. The prediction MAPE val- would explain variations in the monthly percentage
ues for that period are 3.83%, 3.41%, 2.88%, change in the Los Angeles index were identified
5.05%, 4.88%, 3.02%, 4.37%, 3.51%, and 3.55%. using a GP agent. In the second stage, GP and
for GP, ANN, OLS, GP+ANN, OLS+ANN, GP_R, ANN agents were used to produce forecasts of
ANN_R, OLS_R, and the average reconciliated, the input variables identified in the first stage.
respectively. Although the OLS independent In the third, agents competed and cooperated to
forecasts appear best (with prediction MAPE = produce a set of forecasts from which final forecast
2.8%), they are not statistically different from the were obtained.
GP_R forecasts (with prediction MAPE = 3%). In the final analysis, the forecasts obtained by
Testing the null hypothesis that the mean differ- the employed agents were marginally different
ence between them is zero generated t-statistic = and can be viewed as too similar to select one
-0.18 and p-value = 0.86. This means that the null as best. The average forecasts obtained using
(i.e., difference is zero or the hypothesis that there cooperation between the different agents em-
is no difference between the two forecasts) cannot ployed is rather acceptable and as good as any.
be rejected at any reasonable level of significance. Actual index values published after the forecasts
Plot of actual materialized LXXR against that of were completed (for April through March 2009)
the best forecast April 2008 through March 2009 suggest that the OLS independent agent and the
are in Figure 6. GP-reconciliation forecasts were more consistent
with reality and are not statistically significantly
different. Generally, the forecasts produced had a
Figure 6.
15
strong similarity between them. Given the strong Case, K., & Shiller, R. (1990). Forecasting prices
similarity between them, it is easy to conclude and excess returns in the housing market. American
that the Los Angeles metropolitan area housing Real Estate and Urban Economics Association
market will remain depressed until the end of the Journal, 18, 263–273. doi:.doi:10.1111/1540-
forecast period considered in this research. Prices 6229.00521
will stabilize in 2009 but resume their decline
Case, K., & Shiller, R. (2003). Is there a bubble
early in 2010.
in the housing market? Brookings Papers on
Economic Activity, 1, 299–342. doi:.doi:10.1353/
eca.2004.0004
REFERENCES
Chen, S., & Yeh, C. (1995). Predicting stock re-
Appleton-Young, L. (2008). 2008 real estate turns with genetic programming: Do the short-run
market forecast. California Association of Re- nonlinear regularities exist? In D. Fisher (Ed.),
altors. Retrieved December 2008, from http:// Proceedings of the Fifth International Workshop
bayareahousingreview.com/wp-content/upon Artificial Intelligence and Statistics (pp. 95-
loads/2008/02/ leslie_appleton_young _preso 101). Ft. Lauderdale, FL.
_read-only1.pdf.
Chen, S., & Yeh, C. (1996). Genetic programming
Barto, A. (1996). Muti-agent reinforcement learn- learning and the cobweb model . In Angeline, P.
ing and adaptive neural networks. Retrieved De- (Ed.), Advances in Genetic Programming (Vol.
cember 2008, from http://stinet.dtic.mil/cgi-bin/ 2, pp. 443–466). Cambridge, MA: MIT Press.
GetTRDoc?AD=ADA315266&Location=U2&d
oc =GetTRDoc.pdf. Chen, X., & Tokinaga, S. (2006). Analysis of price
fluctuation in double auction markets consisting
Bostonbubble.com. (2007). S&P/Case-Shiller of multi-agents using the genetic programming for
Boston snapshot Q3 2007. Retrieved December learning. Retrieved from https://qir.kyushuu.ac.jp/
2008, from http://www.bostonbubble.com/fo- dspace/bitstream /2324/8706/ 1/ p147-167.pdf.
rums/viewtopic.php?t=598.
China Bystanders. (2008). Bank profits trimmed
C.A.R. (2008). U.S. economic outlook: 2008. by subprime losses. Retrieved from http://chinaby-
Retrieved December 2008, from http://rodomino. stander. wordpress.com /2008/03/25/bank-profits-
realtor.org/Research.nsf/files/ currentforecast. trimmed-by-subprime-losses/.
pdf/$FILE/currentforecast.pdf.
CME 2007. (n.d.). Retrieved December 2008,
Case, K., Glaeser, E., & Parker, J. (2000). Real es- from http://www.cme.com/trading/prd/re/hous-
tate and the macroeconomy. Brookings Papers on ing.html.
Economic Activity, 2, 119–162. doi:.doi:10.1353/
eca.2000.0011 Commonweal of Australia. (2001). Economic
Outlook. Retrieved December 2008, from http://
Case, K., & Shiller, R. (1989). The efficiency of www.budget.gov.au/2000-01/papers/ bp1/html/
the market for single-family homes. The American bs2.htm.
Economic Review, 79, 125–137.
Economist.com. (2007). The world economy:
Rocky terrain ahead. Retrieved December 2008,
from http://www.economist.com/ daily/news/
displaystory.cfm?storyid=9725432&top_story=1.
16
Fair, R., & Jaffee, D. (1972). Methods of estima- Koza, J. (1992). Genetic programming. Cam-
tion for markets in disequilibrium. Econometrica, bridge, MA: The MIT Press.
40, 497–514. doi:.doi:10.2307/1913181
Ludwig, A., & Torsten, S. (2001). The impact of
Freddie Mac. (2008a). CMHPI data. Retrieved stock prices and house prices on consumption in
December 2008, from http://www.freddiemac. OECD countries. Retrieved December 2008, from
com/finance/ cmhpi/#old. http://www.vwl.uni-mannheim.de/brownbag/
ludwig.pdf.
Freddie Mac. (2008b). 30-year fixed rate histori-
cal Tables. Historical PMMS® Data. Retrieved Money, C. N. N. com (2008). World economy
December 2008, from http://www.freddiemac. on thin ice - U.N.: The United Nations blames
com/pmms/pmms30.htm. dire situation on the decline of the U.S. housing
and financial sectors. Retrieved December 2008,
Group, C. M. E. (2007). S&P/Case-Shiller Price
from http://money.cnn.com/2008/05 /15/news/
Index: Futures and options. Retrieved December
international/global_economy.ap/.
2008, from http://housingderivatives. typepad.
com/housing_derivatives/files/cme_housing Moody’s. Economy.com (2008). Case-Shiller®
_fact_sheet.pdf. Home Price Index forecasts. Moody’s Analytics,
Inc. Retrieved December 2008, from http://www.
Haji, K. (2007). Subprime mortgage crisis casts a
economy.com/home/products/case_shiller_in-
global shadow – medium-term economic forecast
dexes.asp.
(FY 2007~2017). Retrieved December 2008, from
http://www.nli-research.co.jp/english/econom- National Association of Home Builders, The Hous-
ics/2007/ eco071228.pdf. ing Policy Department. (2005). The local impact
of home building in a typical metropolitan area:
Housing Predictor. (2008). Independent real
Income, jobs, and taxes generated. Retrieved De-
estate housing forecast. Retrieved December
cember 2008, from http://www.nahb.org/fileUp-
2008, from http://www.housingpredictor.com/
load_details.aspx?contentTypeID=3&contentID=
california.html.
35601& subContentID=28002.
Iacono, T. (2008). Case-Shiller® Home Price
NeuroSolutionsTM (2002). The Neural Network
Index forecasts: Exclusive house-price forecasts
Simulation Environment. Version 3, NeuroDimen-
based on Fiserv’s leading Case-Shiller Home
sions, Inc., Gainesville, FL.
Price Indexes. Retrieved December 2008, from
http://www.economy.com/home/products/ case_ Principe, J., Euliano, N., & Lefebvre, C. (2000).
shiller_indexes.asp. Neural and Adaptive Systems: Fundamentals
through Simulations. New York: John Wiley &
Kaboudan, M. (2001). Genetically evolved mod-
Sons, Inc.
els and normality of their residuals. Journal of
Economic Dynamics & Control, 25, 1719–1749. Standard & Poor’s. (2008a). S&P/Case-Shiller®
doi:.doi:10.1016/S0165-1889(00)00004-X Home Price Indices Methodology. Standard &
Poor’s. Retrieved December 2008, from http://
Kaboudan, M. (2004). TSGP: A time series ge-
www2.standardandpoors.com/spf/pdf/index/
netic programming software. Retrieved December
SP_CS_Home_ Price_Indices_ Methodology_
2008, from http://bulldog2.redlands.edu/ fac/
Web.pdf.
mak_kaboudan/tsgp.
17
Standard & Poor’s. (2008b). S&P/Case-Shiller U.S. Census Bureau. (2008a). Housing vacan-
Home Price Indices. Retrieved December 2008, cies and home ownership. Retrieved December
from http://www2.standardandpoors.com/ por- 2008, from http://www.census.gov/hhes/ www/
tal/site/sp/en/us/page.topic/indices_csmahp/ histt10.html.
2,3,4,0,0,0,0,0,0,1,1,0,0,0,0,0.html.
U.S. Census Bureau. (2008b). New residential
Stark, T. (2008). Survey of professional forecast- construction. Retrieved December 2008, from
ers: May 13, 2008. Federal Reserve Bank of Phila- http://www.census.gov/const/www/newrescon-
delphia. Retrieved December 2008, from http:// stindex_excel.html.
www.philadelphiafed.org/files/spf/survq208.html
Vanstone, B., & Finnie, G. (2007). An empirical
Tsang, E., Li, J., & Butler, J. (1998). EDDIE methodology for developing stockmarket trad-
beats the bookies. Int. J. Software. Practice ing systems using artificial neural networks.
and Experience, 28, 1033–1043. doi:10.1002/ Retrieved December 2008, from http://epub-
(SICI)1097-024X(199808)28:10<1033::AID- lications.bond.edu.au/cgi/ viewcontent.cgi?
SPE198>3.0.CO;2-1 article=1022&context=infotech_pubs.
U.S. Bureau of Economic Analysis. (2008). Re- Warren, M. (1994). Stock price prediction using
gional economic accounts: State personal income. genetic programming . In Koza, J. (Ed.), Genetic
Retrieved December 2008, from http://www.bea. Algorithms at Stanford 1994. Stanford, CA: Stan-
gov/regional/sqpi/default.cfm?sqtable=SQ1. ford Bookstore.
18
19
Chapter 2
An Agent-Based Model for
Portfolio Optimization Using
Search Space Splitting
Yukiko Orito
Hiroshima University, Japan
Yasushi Kambayashi
Yasuhiro Tsujimura
Hisashi Yamamoto
Tokyo Metropolitan University, Japan
ABSTRACT
Portfolio optimization is the determination of the weights of assets to be included in a portfolio in order
to achieve the investment objective. It can be viewed as a tight combinatorial optimization problem that
has many solutions near the optimal solution in a narrow solution space. In order to solve such a tight
problem, we introduce an Agent-based Model in this chapter. We continue to employ the Information
Ratio, a well-known measure of the performance of actively managed portfolios, as an objective func-
tion. Our agent has one portfolio, the Information Ratio and its character as a set of properties. The
evolution of agent properties splits the search space into a lot of small spaces. In a population of one
small space, there is one leader agent and several follower agents. As the processing of the populations
progresses, the agent properties change by the interaction between the leader and the follower, and when
the iteration is over, we obtain one leader who has the highest Information Ratio.
INTRODUCTION determines the appropriate weights of assets

included in a portfolio in order to achieve the
Portfolio optimization, based on the modern investment objective. This optimization problem
portfolio theory proposed by Markowitz (1952), is one of the combinatorial optimization problems
and the solution is the performance value obtained
DOI: 10.4018/978-1-60566-898-7.ch002
An Agent-Based Model for Portfolio Optimization Using Search Space Splitting
by a portfolio. When we attempt to solve such an of assets have similar performance values. The
optimization problem, we usually find candidates second problem is that there are many solutions
for the solutions that are better than others. The near the optimal solution. It is hard to solve such
space of all feasible solutions is called the search a tight optimization problem even with strong
space. There are a number of possible solutions evolutionary algorithms.
in the search space and finding the best solution In this chapter, we propose an Agent-based
is thus equal to finding some extreme values, Model in order to solve this tight optimization
minimum or maximum. In the search space, we problem. In general, agent-based models describe
want to find the best solution, but it is hard to solve interactions and dynamics of a group of trad-
in reasonable time as the number of assets or the ers in the artificial financial market (LeBaron,
number of weights of each asset grows. Because 2000). Our Agent-based Model is implemented
there are many possible solutions in the search as a global and local search method for the port-
space, it is usually hard for us to know where to folio optimization problem. Our agent has a set
find a solution or where to start. In order to solve of properties: its own portfolio, a performance
such a problem, many researchers use methods value obtained by the portfolio and its character.
based on evolutional algorithms: for example, In the starting population, there is one leader
genetic algorithm (GA), simulated annealing, tabu agent, and there are many follower agents. The
search, some local searches and so on. follower agents are categorized into three groups,
There are two investment objectives for portfo- namely obedient group, disobedient group, and
lio management: active management and passive indifferent group. In the first group, the followers
management. Active management is an investment obediently follow the leader’s behaviors. In the
strategy that seeks returns in excess of a given second group, the followers are disobedient and
benchmark index. Passive management is an in- adopt behaviors opposite to that of the leader.
vestment strategy that mirrors a given benchmark In the third group, the followers determine their
index. Thus, if you believe that it is possible to behaviors quite independently. As processing of
outperform the market, you should invest in an the population proceeds through search space
active portfolio. The Information Ratio and the splitting, the agent properties change through the
Sharpe Ratio are well-known indices for active interaction between the leader and the followers,
portfolio evaluation. On the other hand, if you think and gradually a best performing agent (the leader
that it is not possible to outperform the market, agent) with the highest performance value emerges
you should invest in a passive portfolio. There are as the optimal solution. Hence, our Agent-based
several reports that index funds that employ pas- Model has the advantage that our model searches
sive management show better performance than solutions in global space as well as local spaces
other mutual funds (e.g. see Elton et. al, 1996; for this tight optimization problem, because plural
Gruber, 1996; Malkiel, 1995). The correlation leader agents appear and disappear during search
between the portfolio price and the benchmark space splitting.
index and Beta are famous indices that are used The structure of the balance of this chapter
to evaluate the passive portfolio. is as follows: Section 2 describes related works.
This optimization problem can be viewed as a Section 3 defines the portfolio optimization prob-
discrete combinatorial problem regardless of the lem and describes the performance value used to
index we choose to evaluate the performance of evaluate the portfolio as the objective function. In
active or passive portfolios. Hence, this optimiza- Section 4, we propose an Agent-based Model in
tion problem has two subproblems. The first one is order to optimize the portfolios. Section 5 shows
that portfolios consisting of quite different weights
20
the results of numerical experiments obtained by On the other hand, agent-based models for
the simulation of the Agent-based Model. artificial financial markets have recently been
popular in research. The agent properties change
by the interaction between agents, and gradually
RELATED WORKS a best performing agent emerges as an optimal
solution (e.g. see LeBaron et al., 1999; Chen &
Markowitz (1987) proposed the mean-variance Yeh, 2002). Based on the idea of agent-based
methodology for portfolio optimization problems. models, we propose an Agent-based Model for the
The objective function in his methodology is to portfolio optimizations in this chapter. In general,
minimize the variance of the expected returns as many agent-based models describe interactions
the investment risk under the expected return as and dynamics in a group of traders in an artificial
the investment return. Many researchers have ex- financial market (LeBaron, 2000). Our Agent-
tended his methodology to practical formulae and based Model is implemented as a global and local
applications, and have tackled this problem by us- search method for portfolio optimization.
ing methods based on evolutional algorithms. Xia
et al. (2000) proposed a new mean-variance model
with an order of the expected returns of securities, PORTFOLIO OPTIMIzATION
and applied a GA to optimize the portfolios. Chang PROBLEM
et al. (2000) proposed the extended mean-variance
models, and applied various evolution algorithms, First, we define the following notations for the
such as GA, simulated annealing and tabu search. portfolio optimization problem.
Lin and Liu (2008) proposed the extended mean- N: total number of all the assets included in
variance model with minimum transaction lots, and a portfolio.
applied GA to optimize the practical portfolios. i: Asset i, i = 1,…, N.
Streichert & Tanaka-Yamawaki (2006) evaluated Findex (t ) : the value of the given benchmark
a multi-objective evolutionary algorithm with a index at t.
quadratic programming local search for the multi-
Pindex : the sequence of the rates of changes of
objective constrained or unconstrained portfolio
benchmark index over t = 1,…,T. That is the vec-
optimization problems. For passive portfolio op-
timization problems, Oh et al. (2005) showed the ( )
t o r Pindex = Pindex (1), , Pindex (T ) w h o s e
effectiveness of index funds optimized by a GA Pindex (t ) i s d e f i n e d a s
on the Korean Stock Exchange. Their objective
function was a function based on a beta which is ( )
Pindex (t ) = Findex (t + 1) − Findex (t ) Findex (t ) .
a measure of correlation between the fund’s price Fi(t): the price of Asset i at t.
and the benchmark index. Orito & Yamamoto Pi: the sequence of the return rates of Asset i
(2007) and Orito et al. (2009) proposed GA meth- over t = 1,…, T. That is the vector Pi=(Pi (1),…,Pi
ods with a heuristic local search and optimized the (T))hose Pi(t) is defined as Pi(t) = (Fi(t+1)-Fi(t))/
index funds on the Tokyo Stock Exchange. Their Fi(t).
objective functions were correlation coefficients M: total number of all the units of investment.
between the fund’s return rates and the changing Mi : the unit of investment for Asset i. That is
rates of benchmark indices. Aranha & Iba (2007) an integer such that ∑ Mi = M.
also proposed a similar but different method on wi: the weight of Asset i included in the port-
the Tokyo Stock Exchange. folio. That is a real number wi = Mi /M (0 ≤ wi ≤
1. Note that we do not discuss the short sale here.
21
Figure 1. Re-composition of the populations

Gk: the portfolio of k-th agent. That is the vec-
N
tor Gk= w1,…,wN) such that ∑w i
= 1.
i =1
PG : the sequence of the return rates of port-

j
folio Gk over t = 1,…, T. That is the vector
k
(
PG = PG (1), , PG (T )
k k
) w h o s e
N
PG (t ) = ∑ wi ⋅ Pi (t ).
k
i =1
It is well known that the Information Ratio, AGENT-BASED MODEL
which is built on the modern portfolio theory, is
an index for the evaluation of active portfolios. In our Agent-based Model, the search space is
It is defined as the active return divided by the split into several small spaces. Agents in one
tracking error. The active return means the amount population search for solutions in one small space.
of performance over or under a given benchmark One population consists of one leader agent and
index. The tracking error is the standard deviation several follower agents. As the agents’ properties
of the active returns. Therefore, it is desirable to through evolve, a new leader appears, and then all
achieve a high Information Ratio. populations in the search space are re-composed.
In this chapter, we define the Information Ratio Such re-composition of populations represents
of portfolio Gk as the objective function for the the search space splitting as shown in Figure 1.
portfolio optimization problem to be as follows: In this section, we describe our Agent-based
Model. Section 1 defines the agent. Section 2
defines the evolutional behavior of the agents in
E PG − Pindex  a population. Section 3 describes the outcome of
max IRG =  k  , (1)
k processing of the population as the search space
var PG − Pindex  splitting.
 k 
Agents
where E PG − Pindex  is the expected value of Figure 2 shows a typical agent in our Agent-based
 k  Model. Each agent has its own portfolio, a unit of
the historical data of portfolio’s return rates over
investment of the portfolio, the Information Ratio,
or under the rates of changes of benchmark index,
and its character as the properties. The character
and var PG − Pindex  is the standard deviation of agent is either “obedient”, “disobedient”, or
 k  “independent.” The agent’s evolutional behavior
from the same data.
depending on its character is described in Section 2.
In this chapter, we optimize portfolios that
consist of the combination of weights of N assets
Evolutional Behavior of Agents in a
with M elements in order to maximize the Infor-
Population
mation Ratio given by Equation (1). To obtain
one portfolio Gk = (w1,…,wN) means to obtain its
For our Agent-based Model, let s be the number
Information Ration as one solution. The number
of iterations of the evolutional process. In the first
of combinations, i.e. the number of solutions, is
(s = 1) iteration, we set W agents as the initial
given by (M + N – 1)!/N!(M -1)!.
22
Figure 2. Agent properties Figure 3. Behavior of the obedient agent
population. For the unit of investment of Asset i

for Agent k, an integer Mi (i = 1,…, N) is ran-
domly given. Then Agent k has the portfolio Gk
and its Information Ratio. In addition, Agent k is
randomly given one of the following characters: 1. We randomly select n assets from N assets
obedient, disobedient or independent. The rates to put in the portfolio. For each of these n
of obedient agent, disobedient agent and indepen- assets, we replace the follower’s investment
dent agent in the initial population are given by units Mi with those of the leader’s.
aobd , adob and aidp , respectively. 2. We randomly repeat the selection of one asset
from N - n assets, which are not chosen in
An agent whose Information Ratio is the high-
step 1 and increase or decrease one invest-
est of all the agents in the population becomes
ment unit of it until ∑ Mi = M. We call this
the leader agent in its population and the others
operation “normalization” in this chapter.
remain as followers. On the s-th process, the
For example, Figure 3 shows an evolu-
follower agents’ properties change through the
tional behavior of an obedient follower
interaction between the leader and the follower
agent with N = 5, M = 10 and n = 3.
agents. On the other hand, the leader agent’s
properties change by its own rule. The evolutional
behaviors of each kind of follower and the leader
Disobedient Follower
are described below.
If Agent k is a follower agent and its character
is disobedient, Agent k is a follower agent that
Behavior of Follower Agent
does not imitate a part of the leader’s portfolio.
The procedure of the disobedient follower’s evo-
The follower agents, depending on their character:
lutional behavior is as follows:
obedient, disobedient or independent, change their
properties through the interaction with the leader
1. We randomly select n assets from N assets
agent, and evolve into new agents.
to put in the portfolio. For each of these n
assets, we subtract the follower’s investment
Obedient Follower
units Mi from those of the leader. If the se-
If Agent k is a follower agent and its character
lected asset’s investment units becomes 0,
is obedient, Agent k is an agent that imitates a
the subtraction of the asset stops.
part of the leader’s portfolio. The procedure of
2. For normalization, we randomly repeat the
the obedient follower’s evolutionary behavior is
selection of one asset from N - n assets not
as follows:
chosen in step 1 and increase one investment
unit of it until ∑ Mi = M.
23
Figure 4. [Behavior of the disobedient agent]
For example, Figure 4 shows an evolutional Behavior of the Leader Agent

behavior of a disobedient agent with N = 5, M =
10 and n = 3. The leader in a population is an agent whose In-
formation Ratio is the highest of all agents in the
Independent Follower search space when the population is generated. The
The independent follower is an agent whose leader agent’s properties change by its own rule.
behavior is not influenced by the leader, and The procedure for the evolutionary behavior of
changes its portfolio by its own independent rule the leader is as follows. We note that the leader’s
defined as Equation (2). A randomly chosen integer Information Ratio in the re-composed population
Mi is the set for the unit of investment. is not the highest of all agents in the search space.
We describe the outcome of processing of the
Mi ∈[0, V] (i =1,…, N), (2) population in the Section 3.
1. We randomly select n assets from N assets

where V is a given integer parameter as a upper to put in the portfolio. We express these n
bound of the number of investment units. The new assets as {i1, , in } and call it Group k1. As
investments units are to be normalized. a similar manner, we express the N - n assets
By performing the evolutionary behaviors of
as {in +1, , iN } and call it Group k2. We
obedient, disobedient and independent followers
according to their assigned characters, all the fol- define the expected value of the historical
lower agents have new investment units (M1,…, data of each group’s return rates over or
MN). Hence the new portfolio Gk = (g1,…,gN) is under the rates of changes of benchmark
calculated again and the Information Ratio of index as follows;
the portfolio is updated in order to complete the
 
evolutions of all the properties of each follower.
E PG − Pindex 
  1 k


(G = (w , , w )) .
k1 i1 in
 
 E PG − Pindex 
  k2


(G = (w , , w ))
k2 in +1 iN
(3)
24
Figure 5. Behavior of the leader agent
    Processing of the Population

2. If E  PG − Pindex  ≥ E  PG − Pindex  in
 k1
  k2

Equation (3) is satisfied, we move z invest- In this section, we describe the outcome of pro-
ment units for the randomly selected assets cessing the population. On the s-th process, the
from Group k2 to Group k1. On the other properties of the leader and the follower agents
h a n d , i f are updated by their evolutionary behaviors de-
    scribed in the section on the evolutional behavior
E  PG − Pindex  < E PG − Pindex  is of agents. Hence, the new leader agent who has
 k1   k2 
satisfied, we move z investment units for the highest Information Ratio appears. The new
the randomly selected assets from Group k1 population, with its new leader agent, is gener-
to Group k2. ated on the s +1st process. In other words, the
progression of the population in our Agent-based
For example, Figure 5 shows an evolutional Model repeats the generation of the new leader
behavior of the leader agent with N = 5, M = 10, agent and the division of current populations of
n = 3 and z = 2. the s-th process. Populations with no follower
Upon execution of this behavior, the leader agents, which are caused by repeated splitting
agent has the new investment units (M1,…, MN. of search spaces, disappear. The leader of such
Hence the new portfolio Gk = (g1, …, gN) is cal- a population becomes a follower agent in one of
culated again and the Information Ratio of the other populations. Figure 6 shows such a series
portfolio is updated in order to complete the of processes as an example. We note that, in our
evolution of all the properties. model, it is possible that an agent who has lower
Information Ratio than the new leader’s ratio and
has higher Information Ratio than the current
25
Figure 6. Progression populations during search

leader’s ratio remains as a follower agent in the
space splitting
current leader’s population.
As shown in Figure 6, one population consists
of one leader agent and its follower agents. When
a population is split, all follower agents select the
current leader in the current population or the new
leader whose Information Ratio is the highest in
the search space as their own leader on the next
process. The criterion for this selection is defined
as follows.
1. When the new leader appears in the popula-

tion to which some of the current follower
agents belong, the rate bsame is used to select
the new followers in the population of the
new leader agent on the next process. On
the other hand, the ratio of 1 - bsame follow-
ers remain to be followers of the current
leader.
2. When the new leader appears in the popula-
tion to which the follower agents do not
belong, the rate bdiff is used to select the
known benchmark index in the First Section of
followers in the population of the new leader
Tokyo Stock Exchange, represents the increase or
agent on the next process. On the other hand,
decrease in stock values of all companies listed on
the ratio of 1 - bdiff followers remain to be
the market. For this, these assets are typical assets
followers of the current leader. that have a big influence on the value of TOPIX.
After repeating these processes, we select the Information Ratio as a Function of

portfolio whose Information Ratio is the highest Parameters in an Agent-Based Model
in the search space as our optimal portfolio.
The Information Ratios of the optimal portfolios
obtained by our Agent-based Model depend on
SIMULATION RESULTS the starting parameters. For the combinatorial
optimization problem with the total number of
We have applied our Agent-based Model to each of all the assets included in a portfolio N = 200 and
12 data periods on the First Section of Tokyo Stock the total number of all the units of investment M
Exchange from Jan. 6, 1997 to Oct. 2, 2006. Each = N×100, we applied our Agent-based Model us-
data period is 100 days, and is shifted every 200 ing parameters given by Table 1 for the numerical
days from Jan. 6, 1997. It is called from Period 1 experiments.
(Jan. 6, 1997 - May. 30, 1997) to Period 12 (May. For n = 20(=N×0.1) and n = 40(=N×0.2), the
12, 2006 - Oct. 2, 2006), respectively. The dataset Information Ratios of the optimal portfolios ob-
consists of assets in the order of high turnover tained by our Agent-based Model for the twelve
average as a subset of TOPIX. The TOPIX, a well
26
Table 1. Parameters in Agent-based Model
The total number of all the assets included in a portfolio: N 200
The total number of all the units of investment: M 20000 (=N ×100)
The total number of agents in the solution space 100
The number of iterations of the evolutionary process: K 100,200,300,400,500

The number of selected assets for the interaction between N × c(c = 0.0, 0.1, 0.2,…,1)
the leader and the follower: n 0,20,40,60,80,100,120,140,160,180,200
The ratio of obedient, disobedient and independent agent {1,0,0},{0.8,0.2,0},{0.8,0,0.2},{0.6,0.4,0},
in the solution space: {a abd
, adob , aidp }
{0.6,0.2,0.2},{0.6,0,0.4},{0.4,0.6,0},{0.4,0.4,0.2},
{0.4,0.2,0.4},{0.4,0,0.6},{0.2,0.8,0},{0.2,0.6,0.2},
{0.2,0.4,0.4},{0.2,0.2,0.6},{0.2,0,0.8},{0,1,0},
{0,0.8,0.2},{0,0.6,0.4},{0,0.4,0.6},{0,0.2,0.8},{0,0,1}
The ratio of follower agent who moves to the new {1,1},{1,0.75},{1,0.5},{1,0.25},{1,0},{0.75,1},
leader’s population from the same and the different {0.75,0.75},{0.75,0.5},{0.75,0.25},{0.75,0},
population to which the current leader belong: {0.5,1},{0.5,0.75},{0.5,0.5},{0.5,0.25},{0.5,0},
{b same
, bdiff } {0.25,1},{0.25,0.75},{0.25,0.5},{0.25,0.25},
{0.25,0},{0,1},{0,0.75},{0,0.5},{0,0.25},{0,0}
The number of investment units moved between Groups

2000 (=M × 0.1)
k1 and k2: z
periods as a function of the repetition of the pro- {a abd

, adob , aidp } = {0.6, 0.2, 0.2} a n d
gresses process are shown in Figure 7 (a) and (b),
respectively. We repeated the evolution process {b same
, bdiff } = {0.5, 0.5} .
up to 500 times. Here, we set the parameters From Figure 8, we can observe that the Infor-
mation Ratio is high when n is small except for
{aabd , adob , aidp } a n d {bsame , bdiff } t o the case of n = 0. As defined in the behavior of
{a abd
, adob , aidp } = {0.6, 0.2, 0.2} a n d follower agent section, n is the number of assets
{b same
, bdiff } = {0.5, 0.5} . for which an obedient follower matches the
leader’s weight or that a disobedient follower
(a) n=20
subtracts from its portfolio contrary to the leader’s
(b) n=40
weight of that asset in the leader’s portfolio. In
From Figure 7 (a) and (b), we can observe
this context, we can conclude that Figure 8 sug-
that the Information Ratio becomes higher as the
gests that good agents do not appear when the
number of repetitions of the process increases.
interaction between the leader and the follower
Therefore, we set the number of repetitions to be
is too strong. Therefore, we set as n = 40 for
K = 500 for further experiments.
further experiments, because its Information
Next, the Information Ratio average of all
Ratio is the highest of all. This means that the
periods as a function of the number of selected
followers interact with 20% of leader’s assets.
assets for the interaction between the leader and
Next, we discuss the influence of parameters
the follower n is shown in Figure 8. As well as
Figure 7, we set the parameters {aabd , adob , aidp } {aabd , adob , aidp } and {bsame , bdiff } on the portfo-
lio optimization problem. The Information Ratio
a n d {b same
, bdiff } t o
average of all periods as a function of
{aabd , adob , aidp } and a function of {bsame , bdiff }
27
Figure 7. Information Ratio as a function of the number of iterations of the evolutionary process
28
Figure 8. Information Ratio as a function of the number of selected assets for the interaction between
the leader and the follower
is shown in Figures 9 and 10, respectively. Note solution space is large. This means that the effec-
that we set {bsame , bdiff } = {0.5, 0.5} for Figure tive population should consist of many obedient
agents and few disobedient and independent
9 and {aabd , adob , aidp } = {0.6, 0.2, 0.2} for Figure
agents. Therefore, we set {aabd , adob , aidp } to be
10.
{0.8, 0, 0.2} for experiments in the next section.
From Figure 9, the Information Ratio with
On the other hand, from Figure 10, we can
{aabd , adob , aidp } = {0.8, 0, 0.2} is the highest. We observe that almost of all the Information Ratios
can conclude that the Information Ratio is high are similar. The exceptions are the cases when
when the proportion of obedient followers in the each of bsame and bdiff is set to 0 or 1. This means
Figure 9. Information Ratio as a function of the ratio {alpha_obd,alpha_dob,alpha_idp}
29
Figure 10. Information Ratio as a function of the ratio {beta_same,beta_diff}
that the ratio of the follower agents who moves is well known that GA is a useful stochastic search
to the new leader’s population from the same or method for such optimization problems (for this,
the different population to which the current see e.g. Holland, 1975; Goldberg, 1989). The
leader belong does not affect the results in our genetic representation of our GA is shown in
Agent-based Model. Figure 11. A gene represents a weight of an asset
wi and a chromosome represents a portfolio Gk.
Comparison of Agent- The fitness value of GA is defined as the Informa-
Based Model and GA tion Ratio.
On the first generation of the GA, we ran-
We compare our Agent-based Model with GA for domly generate the initial population. We apply
the combinatorial optimization problem with the the uniform crossover for exchanging the partial
total number of assets N = 200 and the total num- structure between the two chromosomes and repair
ber of units of investment M = 20000. Therefore, to a probability distribution via renormalization.
the number of combinations is given by (20000 We also apply the uniform mutation for replacing
+ 200 – 1)!/200! (20000 – 1)!. For our model, we the partial structure of the selected chromosomes
set parameters as K = 500, n = 40, with a new random value in [0, 1] and repair to a
{aabd , adob , aidp } = {0.8, 0, 0.2} a n d probability distribution via renormalization. After
making offspring, we apply a roulette wheel selec-
{b same
, bdiff } = {0.5, 0.5} . On the other hand, it
tion and an elitism method of 10% chromosomes
Figure 11. Genetic representation
30
based on the fitness value. As a termination cri- leader and the n followers, and the ratio of the
terion of GA, we apply the generation size. Note obedient, disobedient and independent agents in
that the population size, the generation size, the the solution space {aabd , adob , aidp } .
crossover rate and the mutation rate are set to the
similar values of our Agent-based Model, 100
(the same value of the total number of assets in CONCLUSION
the solution space), 500 (the same value of K),
0.8 (the same value of the ratio of obedient fol- In this chapter, we proposed an Agent-based Model
lower in the solution space) and 0.2 (the same to solve the portfolio optimization problem which
value of the ratio of independent follower in the is a tight optimization problem. The most notable
solution space), respectively. The Information advantage of our Agent-based Model is that our
Ratio average of 20 simulations obtained by our model searches solutions in global space as well
Agent-based Model and GA are shown in Table as local spaces.
2. From the numerical experiments, we conclude
From Table 2, we can observe that the Infor- that our Agent-based Model using search space
mation Ratios of portfolios obtained by our Agent- splitting produces more optimal portfolios than
based Model are higher than those of GA for all simple GA. However, the results obtained by our
the periods. In almost all the periods, the Informa- model depend on the parameters; the number of
tion Ratios obtained by our model exceeds the selected assets for the interaction between the
final results of GA within 50 iterations of evolu- leader and the n followers, and the ratio of the
tionary process. obedient, disobedient and independent agents in
Therefore, we can conclude that our Agent- the solution space {aabd , adob , aidp } .
based Model using search space splitting pro-
Sometimes, portfolios consisting of quite dif-
duces more optimal portfolios than simple GA.
ferent weights of assets have the similar Informa-
However, the results obtained by our Agent-based
tion Ratios to the ratios of other portfolios. We
Model depend on the parameters; the number of
do not have reasonable explanations for this fact.
selected assets for the interaction between the
Table 2. Information Ratio average
Period No. Agent-based Model GA

1 0.886301 0.359296
2 0.713341 0.263974
3 1.353191 0.470981
4 1.384440 0.488758
5 1.673957 0.542885
6 0.792513 0.445230
7 1.002361 0.350776
8 0.765283 0.401970
9 1.129314 0.462260
10 0.869310 0.309170
11 0.514538 0.332241
12 0.364809 0.226260
31
For this, it would be beneficial for us to visualize Goldberg, D. E. (1989). Genetic Algorithms in
the landscape partially and improve our Agent- Search, Optimization and Machine Learning.
based Model on the partial landscape. It is hard Addison-Wesley.
to visualize the landscape of solutions, however,
Gruber, M. J. (1996). Another Puzzle: The Growth
because this problem can be viewed as a discrete
in Actively Managed Mutual Funds. The Journal
combinatorial problem. In addition, we need to
of Finance, 51(3), 783–810. doi:10.2307/2329222
rebalance the portfolios in the future period in
order to maintain their performance. These issues Holland, J. H. (1975). Adaptation in Natural and
are reserved for our future work. Artificial Systems. University of Michigan Press.
LeBaron, B. (2000). Agent-based Computational
Finance: Suggested Readings and Early Research.
ACKNOWLEDGMENT
Journal of Economics & Control, 24, 679–702.
doi:10.1016/S0165-1889(99)00022-6
The authors are grateful to Kimiko Gosney who
gave us useful comments. The first author ac- LeBaron, B., Arthur, W. B., & Palmer, R. (1999).
knowledges partial financial support by Grant Time Series Properties of an Artificial Stock Mar-
#20710119, Grant-in-Aid for Young Scientists ket. Journal of Economics & Control, 23, 1487–
(B) from JSPS, (2008-). 1516. doi:10.1016/S0165-1889(98)00081-5
Lin, C. C., & Liu, Y. T. (2008). Genetic Algorithms
for Portfolio Selection Problems with Minimum
REFERENCES
Transaction Lots. European Journal of Opera-
Aranha, C., & Iba, H. (2007). Portfolio Manage- tional Research, 185(1), 393–404. doi:10.1016/j.
ment by Genetic Algorithms with Error Model- ejor.2006.12.024
ing. In JCIS Online Proceedings of International Malkiel, B. (1995). Returns from Investing in
Conference on Computational Intelligence in Equity Mutual Funds 1971 to 1991. The Journal
Economics & Finance. of Finance, 50, 549–572. doi:10.2307/2329419
Chang, T. J., Meade, N., Beasley, J. E., & Sharaiha, Markowitz, H. (1952). Portfolio Selection. The
Y. M. (2000). Heuristics for Cardinality Con- Journal of Finance, 7, 77–91. doi:10.2307/2975974
strained Portfolio Optimization . Computers & Op-
erations Research, 27, 1271–1302. doi:10.1016/ Markowitz, H. (1987). Mean-Variance Analysis in
S0305-0548(99)00074-X Portfolio Choice and Capital Market. New York:
Basil Blackwell.
Chen, S. H., & Yeh, C. H. (2002). On the Emer-
gent Properties of Artificial Stock Markets: The Oh, K. J., Kim, T. Y., & Min, S. (2005). Using
Efficient Market Hypothesis and the Rational Genetic Algorithm to Support Portfolio Optimiza-
Expectations Hypothesis. Journal of Behavior tion for Index Fund Management . Expert Systems
&Organization, 49, 217–239. doi:10.1016/S0167- with Applications, 28, 371–379. doi:10.1016/j.
2681(02)00068-9 eswa.2004.10.014
Elton, E., Gruber, G., & Blake, C. (1996). Survivor-

ship Bias and Mutual Fund Performance. Review
of Financial Studies, 9, 1097–1120. doi:10.1093/
rfs/9.4.1097
32
Orito, Y., Takeda, M., & Yamamoto, H. (2009). KEY TERMS AND DEFINITIONS
Index Fund Optimization Using Genetic Algo-
rithm and Scatter Diagram Based on Coefficients Portfolio Optimization: A combinatorial
of Determination. Studies in Computational In- optimization problem that determines proportion-
telligence: Intelligent and Evolutionary Systems, weighted combination in a portfolio in order to
187, 1–11. achieve an investment objective.
Information Ratio: A well-known measure
Orito, Y., & Yamamoto, H. (2007). Index Fund of performance of actively managed portfolios.
Optimization Using a Genetic Algorithm and Agent Property: An agent has one portfolio,
a Heuristic Local Search Algorithm on Scatter its Information Ratio and a character as a set of
Diagrams. In Proceedings of 2007 IEEE Congress properties.
on Evolutionary Computation (pp. 2562-2568). Leader Agent: An agent whose Information
Streichert, F., & Tanaka-Yamawaki, M. (2006). Ratio is the highest of all agents in search space.
The Effect of Local Search on the Constrained Follower Agent: An agent is categorized
Portfolio Selection Problem. In Proceedings of into any of three groups, namely obedient group,
2006 IEEE Congress on Evolutionary Computa- disobedient group, and indifferent group. An
tion (pp. 2368-2374). obedient agent is an agent that imitates a part of
the leader’s portfolio. A disobedient agent is an
Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). agent that does not imitate a part of the leader’s
A Model for Portfolio Selection with Order of portfolio. An independent agent is an agent whose
Expected Returns. Computers & Operations behavior is not influenced by the actions of the
Research, 27, 409–422. doi:10.1016/S0305- leader agent.
0548(99)00059-3 Search Space Splitting: Re-composition of
populations. One population consists of one leader
agent and several follower agents. As agents’
properties evolve, a new leader appears, and then
all populations in a search space are re-composed.
33
Section 2
35
Chapter 3
Neuroeconomics:
A Viewpoint from Agent-Based
Computational Economics
Shu-Heng Chen
Shu G. Wang
ABSTRACT
Recently, the relation between neuroeconomics and agent-based computational economics (ACE) has
become an issue concerning the agent-based economics community. Neuroeconomics can interest agent-
based economists when they are inquiring for the foundation or the principle of the software-agent design,
normally known as agent engineering. It has been shown in many studies that the design of software
agents is non-trivial and can determine what will emerge from the bottom. Therefore, it has been quested
for rather a period regarding whether we can sensibly design these software agents, including both the
choice of software agent models, such as reinforcement learning, and the parameter setting associated
with the chosen model, such as risk attitude. In this chapter, we shall start a formal inquiry by focusing
on examining the models and parameters used to build software agents.
NEUROECONOMICS: AN ACE entities and work with each of them separately;

VIEWPOINT instead, it studies the relationship between the
two in a coherent framework. Therefore, given
From the perspective of agent-based compu- the bottom-up manner, we pay more attention to
tational economics (ACE), our interest in neu- the micro details, and always start the modeling at
roeconomics is different from that of general the level of agents. This methodological individu-
psychologists and neural scientists. Agent-based alism drives us to incorporate the psychological,
computational economics advocates a bottom-up cognitive, and neural attributes of human beings
research paradigm for economics. This paradigm into the study of economics. What causes ACE
does not treat micro and macro as two separate to differ from these behavioral sciences is the
scope of the research questions; therefore, while
DOI: 10.4018/978-1-60566-898-7.ch003 ACE cares about the fundamental cause (the
Neuroeconomics
neural cause) of the cognitive biases, it is more PREFERENCE

concerned with the implications of these cogni-
tive biases for any possible emergent mesoscopic “The nature of wealth and value is explained by
or macroscopic phenomena. Furthermore, ACE the consideration of an infinitely small amount
researchers do not regard the behavioral factors as of pleasure and pain, just as the theory of statics
given (exogenous); they also study the feedback is made to rest upon the equality of indefinitely
from the aggregate level (social outcome) to the small amounts of energy. (Jevons, 1879, p. 44;
bottom level (individual behavior). 1 Italics, added)”
Given what has been said above, we believe
that unless neuroeconomics can provide some
important lessons for agent-based computational Standard economic theory takes individual pref-
economists, its significance may hardly go far erences as given and fixed over the course of the
beyond neural science, and would not draw much individual’s lifetime. It would be hard to imagine
attention from economists. This, therefore, mo- how economic models can stand still by giving up
tivates us to ask: Does neuroeconomics provide preferences or utility functions. They serve as the
some important lessons for agent-based economic very foundation of economics just as we quoted
modeling? It is this question that this chapter above from William Stanley Jevons (1835-1882).
would like to address. Without preference or utility, it will no longer
In the following, we will review the recent be clear what we mean by welfare, and hence
progresses in neuroeconomics in light of its con- we make welfare-enhancing policy ill-defined.
tributions to different aspects of agent engineering. Nevertheless, preference is now in a troubling
We start from the most fundamental part of agents, moment in the development of economics. Even
i.e., preferences (Section 2), which points to two though its existence has been questioned, the
foundational issues in economics, namely, the development of neuroeconomics may further
measurement or representation of preference and deepen this turbulent situation.
the formation of preference. Some recent advances
in the study of these two issues may lead to new The Brain as a Multi-Agent System
insights in the future of agent engineering with
regard to preference development. We then move to The recent progress in neural science provides
the immediate issue after preferences, i.e., choices, economists with some foundational issues of
or, more precisely, value-based choices (Section economic theory. Some of its findings may lend
3), and further specify the intertemporal choice support to many heated discussions which are
(Section 3.1), where we can see how the discount unfortunately neglected by mainstream econom-
rate should be more carefully designed. We then ics. The most important series of questions is that
focus more on two behavioral aspects pertaining pertaining to preference. While its existence, for-
to the design of financial agents, namely, risk malization (construction), measurement, consis-
perception (Section 3.2.1) and risk preference tency and stability has long been discussed outside
(Section 3.2.2). The neural mechanism regard- mainstream economics, particularly in the realm of
ing learning or adaptation is given in Section behavioral economics, neuroeconomics provides
4. Finally, the chapter ends with a final remark us with solid ground to tackle these issues.2
that connects the relationships among behavioral To see how neuroscience can inform econo-
economics, neural economics and agent-based mists, it is important to perceive that the brain is
economics, which is a continuation of the points a multi-agent system. For example, consider the
made earlier (Chen, 2008). Triune Brain Model proposed by Maclean (1990).
36
Neuroeconomics
The brain is composed of three major parts: the Preference Construction

reptilian brain (the brainstem), the mammalian
brain (the limbic system), and the hominid brain “On the contrary, we approach choice within
(the cerebral cortex). Each of the three is associated specific, quite narrow frames of reference that
with different cognitive functions, while receiving continually shift with the circumstances in which
and processing different signals. The three parts we find ourselves and with the thoughts that are
also have various interactions (competition or evoked in our minds by these particular circum-
cooperation) with the embedded network. The stances. Thus, in any given choice situation, we
three “agents”’ and their interactions, therefore, evoke and make use only a small part even of the
constitute the very basis of this multi-agent system. limited information, knowledge and reasoning
This multi-agent system (MAS) approach to the skills that we have stored in our memory, and these
brain compels us to think hard on what would be memory contents, even if fully evoked, would
a neural representation of preference. Preference give us only a pale and highly inexact picture of
is unlikely to be represented by a signal neuron the world in which we live.” (Simon, 2005, p.
or a single part of the brain, but by an emergent 93, Italics added)
phenomenon from the interactions of many agents. The MAS approach to the study of the brain
Hence, many agents of the brain can contribute to may connect us to the literature on preference con-
part of the representation. So, when asked what struction for real human beings (Fischhoff, 1991;
the preference for commodity A is and its relative Slovic, 1995; Lichtenstein and Slovic, 2006), and,
comparison for B, many agents of the brain work in particular, the role of experiences and imagi-
together either in a synchronous or asynchronous nation in preference formation. In the following,
manner to generate a representation, the utility of we would like to exemplify a few psychological
A and B, say U(A) and U(B). studies which shed light on the experience-based
During the process, some agents retrieve the or imagination-based preferences.
past experiences (memory) of consuming A and B, Adaptive Decision Makers (Payne, Bettman,
and some agents aggregate this information. These and Johnson, 1993) The effort-accuracy frame-
processes can be collaborative or competitive; it work proposed by Payne, Bettman, and Johnson
is likely that some agents inhibit other agents to (1993) represents an attempt to shift the research
function. As a result, the memory can be partial, agenda from demonstrations of irrationality in the
which, depending on the external elicitation and form of heuristics and biases to an understanding
other conditions, can vary from time to time. of the causal mechanisms underlying the behav-
This rough but simple picture of the multi- ior. It has considerable merit as a model of how
agent neurodynamics may indicate why a steady decision makers cope with cognitive limitations.
preference conventionally assumed in economics The adaptive decision maker is a person whose
may not be there. The alternative is that people repertoire of strategies may depend upon many
do not have given unchanging preferences, but factors, such as cognitive development, experi-
rather their preferences are constructed to fit the ence, and more formal training and education.
situations they face. Herbert Simon is one of the Payne, Bettman, and Johnson (1993) suggest that
precursors of the idea of preference construction decision-making behavior is a highly contingent
(Simon, 1955, 1956). form of information processing and is highly
sensitive to task factors and context factors. They
consider that the cognitive effort required to make
a decision can be usefully measured in terms of
the total number of basic information processes
37
Neuroeconomics
needed to solve a particular problem using a (Kagan, 2006). This system is activated when
specific decision strategy. In addition, they state humans are faced with potential or actual negative
that individual differences in decision behavior events in their life. The system functions to assist
may be related to differences in how much effort in protecting humans from extreme reactions to
the various elementary information processes the those negative events. Sharot, De Martino and
individuals are required to make. Dolan(2008) studied how hedonic psychology af-
Hedonic Psychology Hedonic psychology fects our choices from a neural perspective. They
is the study of what makes experiences and life combined participants’ estimations of the pleasure
pleasant or unpleasant (Kahneman, Diener, and they will derive from future events with fMRI data
Schwarz, 2003). It is concerned with feelings of recorded while they imagined those events, both
pleasure and pain, of interest and boredom, of joy before, and after making choices. It was found
and sorrow, and of satisfaction and dissatisfaction. that activity in the caudate nucleus predicted the
All decisions involve predictions of future tastes choice agents made when forced to choose between
or feelings. Getting married involves a prediction two alternatives they had previously rated equally.
of one’s long-term feelings towards one’s spouse; Moreover, post choice the selected alternatives
returning to school for an advanced degree in- were valued more strongly than pre-choice, while
volves predictions about how it will feel to be a discarded ones were valued less. This post-choice
student as well as predictions of long-term career preference change was mirrored in the caudate
preferences; buying a car involves a prediction nucleus response. The choice-sensitive preference
of how it would feel to drive around in different observed above is similar to behavior driven by
cars. In each of these examples, the quality of the reinforcement learning.
decision depends critically on the accuracy of the
prediction; errors in predicting feelings are mea-
sured in units of divorce, dropout, career burnout VALUE AND ChOICE
and consumer dissatisfaction (Loewenstein and
Schkade, 2003). “Neuroeconomics is a relatively new discipline
Empathy Gaps People are often incorrect that studies the computations that the brain carries
about what determines happiness, leading to out in order to make value-based decisions, as
prediction errors. In particular, the well-known well as the neural implementation of those com-
empathy gaps, i.e., the inability to imagine opposite putations. It seeks to build a biologically sound
feelings when experiencing heightened emotion, theory of how humans make decisions that can
be it happy or sad, lead to errors in predicting be applied in both the natural and the social sci-
both feelings and behavior (Loewenstein, 2005). ences.” (Rangel, Camerer, and Montague, 2008)
So, people seem to think that if disaster strikes “In a choice situation, we usually look at a few
it will take longer to recover emotionally than it alternatives, sometimes including a small number
actually does. Conversely, if a happy event occurs, that we generate for the purpose but more often
people overestimate how long they will emotion- limiting ourselves to those that are already known
ally benefit from it. and available. These alternatives are generated
Psychological Immune System The cogni- or evoked in response to specific goals or drives
tive bias above also indicates that agents may (i.e. specific components of the utility function),
underestimate the proper function of their psy- so that different alternatives are generated when
chological immune systems. The psychological we are hungry from when we are thirsty; when we
immune system is a system which helps fight off are thinking about our science from when we are
bad feelings that result from unpleasant situations thinking about our children.” (Simon, 2005, p. 93)
38
Neuroeconomics
The very basic economics starts with value Frederick, Loewenstein, and O’Donoghue (2002)
assignment and choice making. However, tradi- provided an extensive survey on the empirical
tional economics makes little effort to understand studies showing that the observed discount rates
the cognitive and computation loading involved are not constant over time, but appear to decline.
in this very fundamental economic activity. A Loewenstein (1988) has further demonstrated
number of recent studies have challenged the view that discount rates can be dramatically affected
that what we used to be taught may be misplaced by whether the change in delivery time of an
when we take into account the value-assignment outcome is framed as an acceleration or a delay
problem more seriously (Iyengar and Lepper, from some temporal reference point. So, when
2000; Schwartz, 2003). These studies lead us asked whether they would be willing to wait for
to question the impact of the dimensionality of a month to receive $110 instead of receiving $100
choice space upon our behavior of value assign- today, most people choose $100 today. By contrast,
ment and choice making. It seems that when the when asked whether they would prefer to speed up
number of choices increases, the ability to make the receipt of $110 in a month by receiving $100
the best choice becomes problematic. today instead, most people exhibit patience and
Going one step further, Louie, Grattan, and take the $110 in a month. This phenomenon has
Glimcher(2008) attempt to theorize this paradox been used as evidence for the gain-loss asymmetry
of choice by exploring the neural mechanism or the prospect theory. It has also been connected
underlying value representation during decision- to the endowment effect, which predicts that people
making and how such a mechanism influences tend to value objects more highly after they come
choice behavior in the presence of alternative to feel that they own them (Kahneman, Knetsch
options. In their analysis, value assignment is and Thaler, 1990; Kahneman, 1991). The endow-
relatively normalized when new alternatives are ment effect explains the reluctance of people to
presented. The linear proportionate normalization part with assets that belong to their endowment.
is a simple example. Because value is relatively Nonetheless, Lerner, Small and Loewenstein
coded rather than absolutely coded, the value dif- (2004) show that the agents’ mood, sad or neutral,
ferences between two alternatives may become can affect the appearance of this effect.
narrow when more alternatives are presented. Query Theory Recently, query theory, pro-
posed by Johnson, Haeubl and Keinan (2007),
Intertemporal Choice has been used to explain this and other similar
choice inconsistencies. Query theory assumes that
Agent-based economic models are dynamic. Time preferences, like all knowledge, are subject to the
is an inevitable element, and the time preference processes and dynamics of memory encoding and
becomes another important setting for agents in retrieval, and explores whether memory and atten-
the agent-based models. However, in mainstream tional processes can explain observed anomalies in
economic theory, the time preference has been evaluation and choice. Weber et al. (2007) showed
largely standardized as an exponential discount- that the directional asymmetry in discounting is
ing with a time-invariant discount rate. However, caused by the different order in which memory
recent studies have found that people discount is queried for reasons favoring immediate versus
future outcomes more steeply when they have future consumption, with earlier queries resulting
the opportunity for immediate gratification than in a richer set of responses, and reasons favoring
when all outcomes occur in the future. This has immediate consumption being generated earlier
led to the modification of the declining discount for delay vs. acceleration decisions.
rates or hyperbolic-discounting (Laibson, 1997).
39
Neuroeconomics
Neural Representation of Hyperbolic Neural Representation of Risk

DiscountingMcClure et al. (2004) investigate
the neural systems that underlie discounting the One of the main issues currently discussed in neu-
value of rewards based on the delay until the time roeconomics is the neural representation of risk.
of delivery. They test the theory that hyperbolic Through a large variety of risk experiments,
discounting results from the combined function of it can be shown that many different parts of the
two separate brain systems}. The beta system is brain are involved in decisions under risk, and
hypothesized to place special weight on immediate they vary with experimental designs. Based on
outcomes, while the delta system is hypothesized the activated areas of the brain, one may define
to exert a more consistent weighting across time. a neural representation of the risk associated
They further hypothesize that beta is mediated by with a given experiment. Different kinds of risks
limbic structures and delta by the lateral prefrontal may be differentiated by their different neural
cortex and associated structures supporting higher representations, and different risk-related con-
cognitive functions. Extending McClure et al. cepts may also be distinguished in this way. For
(2004), Finger et al. (2008) conducted an fMRI example, the famous Knight’s distinction between
study investigating participants’ neural activation uncertainty and risk can now be, through delicate
underlying acceleration vs. delay decisions. They experimental designs, actually distinguished from
found hyperbolic discounting only in the delay, their associated neural representations. Using the
but not the acceleration, function. famous Iowa Gambling Task, Lin et al. (2008)
show that uncertainty is represented by the brain
Risk areas closely pertaining to emotion, whereas risk
is associated with the prefrontal cortex. In this
Risk preference plays an important role in many vein, Pushkarskaya et al. (2008) distinguishes
agent-based economic models, in particular ambiguity from conflicts, and Mohr et al. (2008)
agent-based financial models. The frequently separate behavioral risk from reward risk.
used assumptions are CARA (Constant Absolute Identifying the neural representations of dif-
Risk Aversion), CRRA (Constant Relative Risk ferent risks may also shed light on the observed
Aversion), HARA (Hyperbolic Absolute Risk deviations of human behavior based on proba-
Aversion), and mean-variance, but, so far, few bility-based predictions. For example, a number
have ever justified the use of any of these with a of experiments, such as Feldman’s Experiment
neural foundation. This question can be particu- (Feldman, 1962) or the Iowa Gambling Task (Lin,
larly hard because, with the recent development 2008), have indicated that even though subjects are
of neuroscience, we are inevitably pushed to ask given a risk environment, they may still behave as
a deeper question: what the risk is. How does the if they are in a uncertain environment. It is left for
agent recognize the risk involved in his or her further study as to what are the neural processes
decision making? What may cause the perceived behind this pattern recognition test which may
risk to deviate from the real risk? Is there any inhibit or enhance the discovery of the underlying
particular region in our brain which corresponds well-defined probabilistic environment.
to a different order of moments, the statistics used
to summarize the probabilistic uncertainty? Risk Preference
Different assumptions of risk preference, such as

the mean-variance, CARA, CRRA, or HARA, are
used in economic theory, usually in an arbitrary
40
Neuroeconomics
way. While agent-based modeling relies heavily utility depends on the number of states. When
on the idea of heterogeneity, preference or risk the number of states increases, it is more likely
preference in most studies is normally assumed that the mean-variance preference may fit the data
to be homogeneous. Little has been explored on better than the expected utility.
the aggregate dynamics generated by a society
of agents with heterogeneous risk preference.3
Nevertheless, it seems to be quite normal to see LEARNING AND ThE
agents with heterogeneous risk preferences in DRPE hYPOThESIS
neuroeconomic experiments (Paulsen et al., 2008).
Genetics have contributed in accounting for One essential element of agent-based computa-
the difference in risk preference. Kuhnen and tional economics is the notion of autonomous
Chiao (2008) showed that several genes previ- agents, i.e, the agents who are able to learn and
ously linked to emotional behavior and addiction adapt on their own. It would have been a big
are also found to be correlated with risk-taking surprise to us if neuroscience had not cared about
investment decisions. They found that 5HTLPR learning. However, it will also be a surprise to us
ss allele carriers are more risk averse than those if the learning algorithms which we commonly
carrying the sl or ll alleles of the gene. D4DR use for the software agents can actually have their
7-repeat allele carriers are more risk seeking than neural representations. Nonetheless, a few recent
individuals without the 7-repeat allele. Individuals studies have pointed in this direction.
with the D2DR A1/A1 genotype have more stable Studies start with how the brain encodes the
risk preferences than those with the A1/A2 or A2/ prediction error, and how other neural modules
A2 genotype, while those with D4DR 4-repeat react to these errors. The most famous hypothesis
allele have less stable preferences than people in this area is the Dopaminergic reward prediction
who do not have the 4-repeat allele. error (DRPE) hypothesis. This hypothesis states
One of the essential developments in neuro- that neurons that contain the neurotransmitter
economics is to provide neural foundations of release dopamine in proportion to the difference
the risk preferences. It is assumed that the hu- between the predicted reward and the experienced
man brain actually follows the finance approach, reward of a particular event. Recent theoretical
encoding the various statistical inputs needed for and experimental work on dopamine release has
the effective evaluation of the desirability of risky focused on the role that this neurotransmitter plays
gambles. In particular, neurons in parts of the in learning and the resulting choice behavior.
brain respond immediately (with minimal delay) Neuroscientists have hypothesized that the role
to changes in expected rewards and with a short of dopamine is to update the value that humans
delay (about 1 to 2 seconds) to risk, as measured and animals attach to different actions and stimuli,
by the payoff variance (Preuschoff, Bossaerts and which in turn affects the probability that such an
Quartz, 2006). Whether one can find evidence of action will be chosen. If true, this theory suggests
higher-order risk (skewness aversion, for instance) that a deeper understanding of dopamine will
remains an interesting issue. expand economists’ understanding of how beliefs
Some initial studies indicate that risk prefer- and preferences are formed, how they evolve, and
ence may be context-dependent or event-driven, how they play out in the act of choice.
which, to some extent, can be triggered by how Caplin and Dean (2008) formulate the DRPE
the risky environment is presented. d’Acremont hypothesis in axiomatic terms. Their treatment
and Bossaerts(2008) show that the dominance has precisely the revealed preference character-
of mean-variance preference over the expected istic of identifying any possible reward function
41
Neuroeconomics
directly from the observables. They discuss adapted strategies. EEG recordings revealed
the potential for measured dopamine release to activation of a reflective (conflict-resolution)
provide insight into belief formation in repeated system, evidently to inhibit impulsive emotional
games and to learning theory, e.g., reinforcement reactions after disappointing outcomes. Pearson
learning. Their axiomatic model specifies three et al. (2008) initiated another interesting line of
easily testable conditions for the entire class of research, i.e., the neural representations which
reward prediction error (RPE) models. Briefly, the distinguish exploration from exploitation, the
axioms will be satisfied if activity is (1) increase two fundamental search strategies frequently
wit prize magnitude (2) decreasing with lottery used in various intelligent algorithms, say, genetic
expected value and (3) equivalent for outcomes algorithms.
from all lotteries with a single possible outcome.
These three conditions are both necessary and
sufficient for any RPE signal. If they hold, there DUAL SYSTEM CONjECTURE
is a way of defining experienced and predicted
reward such that the signal encodes RPE with The dual system conjecture generally refers to
respect to those definitions. Rutledge et al. (2008) the hypothesis that human thinking and decision-
used the BOLD responses at the outcome time to making are governed by two different but interact-
test whether activity in the nucleus accumbens ing systems. This conjecture has been increasingly
satisfies the axioms of the RPE model. recognized as being influential in psychology
Klucharev et al. (2008) show that a deviation (Kahneman, Diener, and Schwarz, 2003), neural
from the group opinion is detected by neural science (McClure, 2004), and economics. The two
activity in the rostral cingular zone (RCZ) and systems are an affective system and a deliberative
ventral striatum. These regions produce a neu- system (Loewenstein and O’Donoghue, 2005) or
ral signal similar to the prediction error signal a reflexive system and a reflective system (Lieber-
in reinforcement learning that indicates a need man, 2003). The affective system is considered to
for social conformity: a strong conflict-related be myopic, activated by environmental stimuli,
signal in the RCZ and NAc trigger adjustment of and primarily driven by affective states. The de-
judgments in line with group opinion. Using an liberative system is generally described as being
olfactory categorization task performed by rats, goal-oriented and forward-looking. The former is
Kepecs, Uchida, and Mainen (2008) attempt to associated with the areas of the brain that we have
obtain evidence for quantitative measurements of labeled the ventral striatum (nucleus accumbens,
learning increments and test the hypothesis implied ventral caudate, and ventral putamen), the right
by the reinforcement learning, i.e., one should striatum, neostriatum and amygdala, among oth-
learn more when uncertain and less when certain. ers, whereas the latter is associated with the areas
Studies also try to find the neural representation of the brain that we have labeled the ventromedial
of different learning algorithms. The commonly and dorsolateral prefrontal and anterior cingulate,
used reinforcement learning and Bayesian learning among others.
is compared in Bossaerts et al. (2008) where they The dual system of the brain has become the
address the existence of the dual system.4 They neuroeconomic area which economic theorists
consider the reflective system and the reflexive take the most seriously. This has also helped
system as the neural representation of Bayesian with the formation of the new field known as
learning and reinforcement learning, respectively. neuroeconomic theory. A number of dual-process
Using the trust game, they were able to stratify models have been proposed in economics with
subjects into two groups. One group used well- applications to intertemporal choice (Loewenstein
42
Neuroeconomics
and O’Donoghue, 2005; Fudenberg and Levin, and trading volume. If individual learning can be
2006; Brocas and Carrillo, 2008), risk preferences associated with, say, the deliberative system, and
(Loewenstein and O’Donoghue, 2005), and so- social learning can be connected to the affective
cial preferences (Loewenstein and O’Donoghue, system, then the dual system can also be applied to
2005). All these models view economic behavior agent-based modeling. This issue opens the future
as being determined by the interaction between to collaboration between agent-based economics
two different systems. and neuroeconomics.
The application of the dual system conjecture
to learning is just the beginning. Earlier, we have
mentioned the cognitive loading between different FROM MODULAR MIND/BRAIN
learning algorithms, such as reinforcement learn- TO MODULAR PREFERENCE
ing vs. Bayesian learning (see Section 4). This
issue has been recently discussed in experimental At present, modularity (Simon, 1965) is still not
economics (Charness and Levin, 2005), and now a part of agent-based economic modeling. This
also in neuroeconomics (Bossaerts et al.,2008). absence is a little disappointing since ACE is
regarded as a complement to mainstream eco-
Software Agents with nomics in terms of articulating the mechanism of
Neurocognitive Dual Systems evolution and automatic discovery. One way of
making progress is to enable autonomous agents to
While agents with dual systems have been con- discover the modular structure of their surround-
sidered to be a new research direction in neuro- ings, and hence they can adapt by using modules.
economic theory (Brocas and Carrillo, 2008a, This is almost equivalent to causing their “brain”
Brocas and Carrillo, 2008b), software agents or “mind” to be designed in a modular way as well.
or autonomous agents in agent-based modeling The only available work in agent-based eco-
mostly follow a single system. However, the dual nomic modeling which incorporates the idea of
system interpretation exists for many agent-based modularity is that related to the agent-based models
economic models. Consider the fundamentalist- of innovation initiated by Chen and Chie (2004).
chartist model as an example, where the fun- They proposed a modular economy whose demand
damentalist’s and chartist’s behavior can be side and supply side both have a decomposable
differentiated by the associated neural systems, structure. While the decomposability of the supply
say, assuming the former is associated with a side, i.e., production, has already received inten-
deliberative system while the latter is associated sive treatment in the literature, the demand side
with the affective system. has not. Inspired by the study of neurocognitive
Another example is the individual learning modularity, Chen and Chie (2004) assume that
vs. social learning. These two learning schemes the preference of consumers can be decompos-
have been frequently applied to model the learnable.5 In this way, the demand side of the modular
ing behavior in experiments and their fit to the economy corresponds to a market composed of a
experimental data are different (Hanaki, 2005). set of consumers with modular preference.
Agent-based simulation has also shown that their In the modular economy, the assumption of
emergent patterns are different. For example, in the modular preference is made in the form of a
context of an artificial stock market, Yeh and Chen dual relationship with the assumption of modular
(2001) show that agents using individual learning production. Nevertheless, whether in reality the
behave differently from agents using social learn- two can have a nice mapping, e.g., a one-to-one
ing in terms of market efficiency, price dynamics relationship, is an issue related to the distinction
43
Neuroeconomics
between structural modularity and functional ers up to higher hierarchies. However, consumers
modularity. While in the literature this distinc- become more and more heterogeneous when their
tion has been well noticed and discussed, “recent preferences are compared at higher and higher
progress in developmental genetics has led to hierarchies, which calls for a greater diversity of
remarkable insights into the molecular mecha- products.6 It can then be shown that the firm using
nisms of morphogenesis, but has at the same time a modular design performs better than the firm
blurred the clear distinction between structure not using a modular design, as Simon predicted.
and function.” (Callebaut and Rasskin-Gutman,
2005, p. 10)
The modular economy considered by Chen CONCLUDING REMARKS: AGENT
and Chie (2004) does not distinguish between the BASED OR BRAIN BASED?
two kinds of modularity, and they are assumed
to be identical. One may argue that the notion Can we relate agent-based economics to brain-
of modularity that is suitable for preference is based economics (neuroeconomics)? Can we
structural, i.e., what it is, whereas the one that use the knowledge which we obtain from neuro-
is suitable for production is process, i.e., what economics to design software agents? One of the
is does. However, this understanding may be features of agent-based economics is the emphasis
partial. Using the LISP (List Programming) parse- on the heterogeneity of agents. This heterogeneity
tree representation, Chen and Chie (2004) have may come from behavioral genetics. Research
actually integrated the two kinds of modularity. has shown that genetics has an effect on our risk
Therefore, consider drinking coffee with sugar preference. Kuhnen and Chiao (2008), Jamison
as an example. Coffee and sugar are modules for et al. (2008), and Weber et al. (2008) show that
both production and consumption. Nevertheless, preferences are affected by the genes and/or
for the former, producers add sugar to coffee to education (environment). With the knowledge of
deliver the final product, whereas for the latter, genetics and neuroeconomics, the question is: How
the consumers drink the mixture knowing of the much more heterogeneity do we want to include
existence of both components or by “seeing” the in agent-based modeling? Does it really matter?
development of the product. Heterogeneity may also result from age. The
Chen and Chie (2007) tested the idea of aug- neuroeconomics evidence shows that certain
mented genetic programming (augmented with functions of the brain will age. The consequence
automatically defined terminals) in a modular is that elderly people will make some systematic
economy. Chen and Chie (2007)considered an errors more often than young people, and, age
economy with two oligopolistic firms. While both will affect financial decisions as well (Samanez
of these firms are autonomous, they are designed Larkin, Kuhnen, and Knutson, 2008). Thus the
differently. One firm is designed with simple GP same question arises: when engaging in agent-
(SGP), whereas the other firm is designed with based modeling, should we take age heterogeneity
augmented GP (AGP). These two different designs into account? So, when a society ages, should we
match the two watchmakers considered by Simon constantly adjust our agent-based model so that
(1965). The modular preferences of consumers not it can match the empirical age distribution of the
only define the search space for firms, but also a society? So far we have not seen any agent-based
search space with different hierarchies. While it is modeling that features the aspect of aging.
easier to meet consumers’ needs with very low-end Neuroeconomics does encourage the modular
products, the resulting profits are negligible. To design of agents, because our brain is a modular
gain higher profits, firms have to satisfy consum- structure. Many different modules in the brain
44
Neuroeconomics
have been identified. Some modules are related Caplin, A., & Dean, M. (2008). Economic insights
to emotion, some are related to cognition, and from ``neuroeconomic’’ data. The American
some are related to self-control. When human Economic Review, 98(2), 169–174. doi:10.1257/
agents are presented with different experimental aer.98.2.169
settings, we often see different combinations of
Charness, G., & Levin, D. (2005). When opti-
these modules.
mal choices feel wrong: A laboratory study of
Bayesian updating, complexity, and affect. The
American Economic Review, 95(4), 1300–1309.
ACKNOWLEDGMENT
doi:10.1257/0002828054825583
The author is grateful for the financial support Chen, S.-H. (2008). Software-agent designs in
provided by the NCCU Top University Program. economics: An interdisciplinary framework.
NSC research grant No. 95-2415-H-004-002- IEEE Computational Intelligence Magazine, 3(4),
MY3 is also gratefully acknowledged. 18–22. doi:10.1109/MCI.2008.929844
Chen, S.-H., & Chie, B.-T. (2004). Agent-based
economic modeling of the evolution of technology:
REFERENCES
The relevance of functional modularity and genetic
Baldassarre, G. (2007, June). Research on brain programming. International Journal of Modern
and behaviour, and agent-based modelling, will Physics B, 18(17-19), 2376–2386. doi:10.1142/
deeply impact investigations on well-being (and S0217979204025403
theoretical economics). Paper presented at Inter- Chen, S.-H., & Chie, B.-T. (2007). Modularity,
national Conference on Policies for Happiness, product innovation, and consumer satisfaction:
Certosa di Pontignano, Siena, Italy. An agent-based approach . In Yin, H., Tino, P.,
Bossaerts, P., Beierholm, U., Anen, C., Tzieropou- Corchado, E., Byrne, W., & Yao, X. (Eds.), Intel-
los, H., Quartz, S., de Peralta, R., & Gonzalez, S. ligent Data Engineering and Automated Learning
(2008, September). Neurobiological foundations (pp. 1053–1062). Heidelberg, Germany: Springer.
for “dual system”’theory in decision making under doi:10.1007/978-3-540-77226-2_105
uncertainty: fMRI and EEG evidence. Paper pre- Chen, S.-H., & Huang, Y.-C. (2008). Risk prefer-
sented at Annual Conference on Neuroeconomics, ence, forecasting accuracy and survival dynamics:
Park City, Utah. Simulations based on a multi-asset agent-based
Brocas, I., & Carrillo, J. (2008a). The brain as artificial stock market. Journal of Economic
a hierarchical organization. The American Eco- Behavior & Organization, 67(3), 702–717.
nomic Review, 98(4), 1312–1346. doi:10.1257/ doi:10.1016/j.jebo.2006.11.006
aer.98.4.1312 d’Acremont, M., & Bossaerts, P. (2008, Septem-
Brocas, I., & Carrillo, J. (2008b). Theories of the ber). Grasping the fundamental difference between
mind. American Economic Review: Papers\& expected utility and mean-variance theories. Paper
Proceedings, 98(2), 175-180. presented at Annual Conference on Neuroeconom-
ics, Park City, Utah.
Callebaut, W., & Rasskin-Gutman, D. (Eds.).
(2005). Modularity: Understanding the develop-
ment and evolution of natural complex systems.
MA: MIT Press.
45
Neuroeconomics
Feldman, J. (1962). Computer simulation of cog- Johnson, E., Haeubl, G., & Keinan, A. (2007).
nitive processes . In Broko, H. (Ed.), Computer Aspects of endowment: A query theory account
applications in the behavioral sciences. Upper of loss aversion for simple objects. Journal of
Saddle River, NJ: Prentice Hall. Experimental Psychology. Learning, Memory,
and Cognition, 33, 461–474. doi:10.1037/0278-
Figner, B., Johnson, E., Lai, G., Krosch, A.,
7393.33.3.461
Steffener, J., & Weber, E. (2008, September).
Asymmetries in intertemporal discounting: Neural Kagan, H. (2006). The Psychological Immune
systems and the directional evaluation of immedi- System: A New Look at Protection and Survival.
ate vs future rewards. Paper presented at Annual Bloomington, IN: AuthorHouse.
Conference on Neuroeconomics, Park City, Utah.
Kahneman, D., Diener, E., & Schwarz, N. (Eds.).
Fischhoff, B. (1991). Value elicitation: Is there (2003). Well-Being: The Foundations of Hedonic
anything in there? The American Psychologist, Psychology. New York, NY: Russell Sage Foun-
46, 835–847. doi:10.1037/0003-066X.46.8.835 dation.
Frederick, S., Loewenstein, G., & O’Donoghue, Kahneman, D., Knetsch, J., & Thaler, R. (1990).
T. (2002). Time discounting and time preference: Experimental tests of the endowment effect and the
A critical review. Journal of Economic Literature, Coase theorem. The Journal of Political Economy,
XL, 351–401. doi:10.1257/002205102320161311 98, 1325–1348. doi:10.1086/261737
Fudenberg, D., & Levine, D. (2006). A dual-self Kahneman, D., Knetsch, J., & Thaler, R. (1991).
model of impulse control. The American Eco- Anomalies: The endowment effect, loss aversion,
nomic Review, 96(5), 1449–1476. doi:10.1257/ and status quo bias. The Journal of Economic
aer.96.5.1449 Perspectives, 5(1), 193–206.
Hanaki, N. (2005). Individual and social learn- Kahneman, D., Ritov, I., & Schkade, D. (1999).
ing. Computational Economics, 26, 213–232. Economic preferences or attitude expressions?
doi:10.1007/s10614-005-9003-5 An analysis of dollar responses to public issues.
Journal of Risk and Uncertainty, 19, 203–235.
Iyengar, S., & Lepper, M. (2000). When choice
doi:10.1023/A:1007835629236
is demotivating: Can one desire too much of a
good thing? Journal of Personality and Social Kepecs, A., Uchida, N., & Mainen, Z. (2008,
Psychology, 79(6), 995–1006. doi:10.1037/0022- September). How uncertainty boosts learning:
3514.79.6.995 Dynamic updating of decision strategies. Paper
presented at Annual Conference on Neuroeconom-
Jamison, J., Saxton, K., Aungle, P., & Francis,
D. (2008). The development of preferences in rat
pups. Paper presented at Annual Conference on Klucharev, V., Hytonen, K., Rijpkema, M., Smidts,
Neuroeconomics, Park City, Utah. A., & Fernandez, G. (2008, September). Neural
mechanisms of social decisions. Paper presented
Jevons, W. (1879). The Theory of Political
at Annual Conference on Neuroeconomics, Park
Economy, 2nd Edtion. Edited and introduced by
City, Utah.
R. Black (1970). Harmondsworth: Penguin.
46
Neuroeconomics
Kuhnen, C., & Chiao, J. (2008, September). Ge- Loewenstein, G., & O’Donoghue, T. (2005). Ani-
netic determinants of financial risk taking. Paper mal spirits: Affective and deliberative processes
presented at Annual Conference on Neuroeconom- in economic behavior. Working Paper. Carnegie
ics, Park City, Utah. Mellon University, Pittsburgh.
Laibson, D. (1997). Golden eggs and hyperbolic Loewenstein, G., & Schkade, D. (2003). Wouldn’t
discounting. The Quarterly Journal of Economics, it be nice?: Predicting future feelings . In Kahne-
12(2), 443–477. doi:10.1162/003355397555253 man, D., Diener, E., & Schwartz, N. (Eds.), He-
donic Psychology: The Foundations of Hedonic
Lerner, J., Small, D., & Loewenstein, G. (2004).
Psychology (pp. 85–105). New York, NY: Russell
Heart strings and purse strings: Carry-over effects
Sage Foundation.
of emotions on economic transactions. Psycho-
logical Science, 15, 337–341. doi:10.1111/j.0956- Louie, K., Grattan, L., & Glimcher, P. (2008).
7976.2004.00679.x Value-based gain control: Relative reward nor-
malization in parietal cortex. Paper presented at
Lichtenstein, S., & Slovic, P. (Eds.). (2006).
Annual Conference on Neuroeconomics, Park
The Construction of Preference. Cambridge,
City, Utah.
UK: Cambridge University Press. doi:10.1017/
CBO9780511618031 MacLean, P. (1990). The Triune Brain in Evolu-
tion: Role in Paleocerebral Function. New York,
Lieberman, M. (2003). Reflective and reflexive
NY: Plenum Press.
judgment processes: A social cognitive neurosci-
ence approach . In Forgas, J., Williams, K., & von McClure, S., Laibson, D., Loewenstein, G., &
Hippel, W. (Eds.), Social Judgments: Explicit and Cohen, J. (2004). Separate neural systems value
Implicit Processes (pp. 44–67). New York, NY: immediate and delayed monetary rewards. Sci-
Cambridge University Press. ence, 306, 503–507. doi:10.1126/science.1100907
Lin, C.-H., Chiu, Y.-C., Lin, Y.-K., & Hsieh, J.-C. Mohr, P., Biele, G., & Heekeren, H. (2008, Septem-
(2008, September). Brain maps of Soochow Gam- ber). Distinct neural representations of behavioral
bling Task. Paper presented at Annual Conference risk and reward risk. Paper presented at Annual
on Neuroeconomics, Park City, Utah. Conference on Neuroeconomics, Park City, Utah.
Lo, A. (2005). Reconciling efficient markets Paulsen, D., Huettel, S., Platt, M., & Brannon, E.
with behavioral finance: The adaptive market (2008, September). Heterogeneity in risky deci-
hypothesis. The Journal of Investment Consult- sion making in 6-to-7-year-old children. Paper
ing, 7(2), 21–44. presented at Annual Conference on Neuroeco-
nomics, Park City, Utah.
Loewenstein, G. (1988). Frames of mind in
intertemporal choice. Management Science, 34, Payne, J., Bettman, J., & Johnson, E. (1993).
200–214. doi:10.1287/mnsc.34.2.200 The adaptive decision maker. New York, NY:
Cambridge University Press.
Loewenstein, G. (2005). Hot-cold empathy gaps
and medical decision making. Health Psychology, Pearson, J., Hayden, B., Raghavachari, S., & Platt,
24(4), S49–S56. doi:10.1037/0278-6133.24.4.S49 M. (2008) Firing rates of neurons in posterior
cingulate cortex predict strategy-switching in a
k-armed bandit task. Paper presented at Annual
47
Neuroeconomics
Preuschoff, K., Bossaerts, P., & Quartz, S. Simon, H. (2005). Darwinism, altruism and eco-
(2006). Neural Differentiation of Expected nomics. In: K. Dopfer (Ed.), The Evolutionary
Reward and Risk in Human Subcortical Struc- Foundations of Economics (89-104), Cambridge,
tures. Neuron, 51(3), 381–390. doi:10.1016/j. UK: Cambridge University Press.
neuron.2006.06.024
Slovic, P. (1995). The construction of prefer-
Pushkarskaya, H., Liu, X., Smithson, M., & ence. The American Psychologist, 50, 364–371.
Joseph, J. (2008, September). Neurobiological doi:10.1037/0003-066X.50.5.364
responses in individuals making choices in un-
Weber, B., Schupp, J., Reuter, M., Montag, C.,
certain environments: Ambiguity and conflict.
Siegel, N., Dohmen, T., et al. (2008). Combining
Paper presented at Annual Conference on Neu-
panel data and genetics: Proof of principle and
roeconomics, Park City, Utah.
first results. Paper presented at Annual Conference
Rangel, A., Camerer, C., & Montague, R. (2008). on Neuroeconomics, Park City, Utah.
A framework for studying the neurobiology of
Weber, E., Johnson, E., Milch, K., Chang, H.,
value-based decision making. Nature Reviews.
Brodscholl, J., & Goldstein, D. (2007). Asym-
Neuroscience, 9, 545–556. doi:10.1038/nrn2357
metric discounting in intertemporal choice: A
Rutledge, R., Dean, M., Caplin, A., & Glimcher, query-theory account. Psychological Science, 18,
P. (2008, September). A neural representation of 516–523. doi:10.1111/j.1467-9280.2007.01932.x
reward prediction error identified using an axiom-
Yeh, C.-H., & Chen, S.-H. (2001). Market diver-
atic model. Paper presented at Annual Conference
sity and market efficiency: The approach based
on Neuroeconomics, Park City, Utah.
on genetic programming. Journal of Artificial
Samanez Larkin, G., Kuhnen, C., & Knutson, B. Simulation of Adaptive Behavior, 1(1), 147–165.
(2008). Financial decision making across the adult
life span. Paper presented at Annual Conference
ENDNOTES
Schwartz, B. (2003). The Paradox of Choice: Why
More Is Less. New York, NY: Harper Perennial.
1
See also Baldassarre (2007). While it has a
sharp focus on the economics of happiness,
Sharot, T., De Martino, B., & Dolan, R. (2008, the idea of building economic agents upon
September) Choice shapes, and reflects, expected the empirical findings of psychology and
hedonic outcome. Paper presented at Annual neuroscience and placing these agents in
Conference on Neuroeconomics, Park City, Utah. an agent-based computational framework is
Simon, H. (1955). A behavioral model of rational the same as what we argue here. From Bal-
choice. The Quarterly Journal of Economics, 69, dassarre (2007), the reader may also find a
99–118. doi:10.2307/1884852 historical development of the cardinal utility
and ordinal utility in economics. It has been
Simon, H. (1956). Rational choice and the struc- a while since economists first considered
ture of the environment. Psychological Review, that utility is a very subjective thing which
63, 129–138. doi:10.1037/h0042769 cannot be measured in a scientific way, so
Simon, H. (1965). The architecture of complexity. that interpersonal comparison of utility is
General Systems, 10, 63–76. impossible, which further causes any redis-
tribution policy to lose its ground.
48
Neuroeconomics
2
It is not clear where preferences come from, 6
If the consumers’ preferences are randomly
i.e., their formation and development pro- generated, then it is easy to see this property
cess, nor by when in time they come to their through the combinatoric mathematics. On
steady state and become fixed. Some recent the other hand, in the parlance of economics,
behavioral studies have even asserted that moving along the hierarchical preferences
people do not have preferences, in the sense means traveling through different regimes,
in which that term is used in economic theory from a primitive manufacturing economy to
(Kahneman, Ritov, and Schkade, 1999). a quality service economy, from the mass
3
For an exception, see Chen and Huang production of homogeneous goods to the
(2008). limited production of massive quantities of
4
See Section 5 for the dual system conjecture. heterogeneous customized products.
5
Whether one can build preference modules
upon the brain/mind modules is of course
an issue deserving further attention.
49
50
Chapter 4
Agents in Quantum and
Neural Uncertainty
Germano Resconi
Catholic University Brescia, Italy
Boris Kovalerchuk
Central Washington University, USA
ABSTRACT
This chapter models quantum and neural uncertainty using a concept of the Agent–based Uncertainty
Theory (AUT). The AUT is based on complex fusion of crisp (non-fuzzy) conflicting judgments of agents.
It provides a uniform representation and an operational empirical interpretation for several uncertainty
theories such as rough set theory, fuzzy sets theory, evidence theory, and probability theory. The AUT
models conflicting evaluations that are fused in the same evaluation context. This agent approach gives
also a novel definition of the quantum uncertainty and quantum computations for quantum gates that are
realized by unitary transformations of the state. In the AUT approach, unitary matrices are interpreted
as logic operations in logic computations. We show that by using permutation operators any type of
complex classical logic expression can be generated. With the quantum gate, we introduce classical logic
into the quantum domain. This chapter connects the intrinsic irrationality of the quantum system and the
non-classical quantum logic with the agents. We argue that AUT can help to find meaning for quantum
superposition of non-consistent states. Next, this chapter shows that the neural fusion at the synapse
can be modeled by the AUT in the same fashion. The neuron is modeled as an operator that transforms
classical logic expressions into many-valued logic expressions. The motivation for such neural network
is to provide high flexibility and logic adaptation of the brain model.
INTRODUCTION uses complex fusion of crisp conflicting judg-

ments of agents. AUT represents and interprets
We model quantum and neural uncertainty using uniformly several uncertainty theories such as
the Agent–based Uncertainty Theory (AUT) that rough set theory, fuzzy sets theory, evidence
theory, and probability theory. AUT exploits the
DOI: 10.4018/978-1-60566-898-7.ch004 fact that agents as independent entities can give
Agents in Quantum and Neural Uncertainty
conflicting evaluations of the same attribute. It expressions or in other words, changes crisp sets
models conflicting evaluations that are fused in into fuzzy sets. This neural network consists of
the same evaluation context. If only one evalua- neurons at two layers. At the first one, neurons or
tion is allowed for each statement in each context agents implement the classical logic operations.
(world) as in the modal logic then there is no At the second layer neurons or nagents (neuron
logical uncertainty. The situation that the AUT agents) compute the same logic expression with
models is inconsistent (fuzzy) and is very far from different results. These are many-valued neurons
the situation that modeled by the traditional logic that fuse results provided by different agents at the
that assumes consistency. We argue that the AUT first layer. They fuse conflicting or inconsistent
by incorporating such inconsistent statements is situations. The network is based on use of the
able to model different types of conflicts and their logic of the uncertainty instead of the classical
fusion known in many-valued logics, fuzzy logic, logic. The motivation for such neural network is to
probability theory and other theories. provide high flexibility and logic adaptation of the
This chapter shows how the agent approach brain model. In this brain model, communication
can be used to give a novel definition of the among agents is specified by the fusion process
quantum uncertainty and quantum computations in the neural elaboration.
for quantum gates that are realized by unitary The probability calculus does not incorporate
transformations of the state. In the AUT approach, explicitly the concepts of irrationality or logic
unitary matrices are interpreted as logic operations conflict of agent’s state. It misses structural in-
in logic computations. It is shown, that by using formation at the level of individual objects, but
permutation operators that are unitary matrixes preserves global information at the level of a set
any type of complex classical logic expression of objects. Given a dice the probability theory
can be generated. The classical logic has well- studies frequencies of the different faces E={e}
known difficulties in quantum mechanics. Now as independent (elementary) events. This set of
with the quantum gate we introduce classical elementary events E has no structure. It is only
logic into the quantum domain. We connect the required that elements of E are mutually exclusive
intrinsic irrationality of the quantum system and and complete, that is no other alternative is pos-
the non-classical quantum logic with the agents. sible. The order of its elements is irrelevant to
We argue that Agent-based uncertainty theory probabilities of each element of E. No irrationality
(AUT) can help to find meaning for quantum or conflict is allowed in this definition relative to
superposition of non-consistent states for which mutual exclusion. The classical probability calcu-
one particle can be at the different points in the lus does not provide a mechanism for modelling
same time or the same particle can have spin up uncertainty when agents communicate (collabo-
and down in the same time. rates or conflict). Recent work by Halpern (2005)
Next, this chapter shows that the neural fusion is an important attempt to fill this gap.
at the synapse can be modeled by the AUT. Agents This chapter is organized as follows: Sections
in the neural network are represented by logic input 2 and 3 provide a summary of the AUT starting
values in the neuron itself. In the ordinary neural from concepts and definitions. Section 4 presents
networks any neuron is a processor that models a links between quantum mechanics and first order
Boolean function. We change the point of view and conflicts in the AUT. Section 5 discusses the
consider a neuron as an operator that transforms neural images of the AUT. Section 6 concludes
classical logic expressions into many-valued logic this chapter.
51
CONCEPTS AND DEFINITIONS Definition. A set of reasoning agents G is

called inconsistent for proposition p if there are
Now we will provide more formal definition of two subset of agents G1, G2 such that agents from
AUT concepts. It is done first for individual agents them provides different truth values for p.
then for sets of agents. Consider a set of agents Multiple reasons lead to agents’ inconsistency.
G={g1, g2,…..,gn}. Each agent gk assigns binary The most general one is the context (or hidden
true/false value v∈{True, false} to proposition p. variables) in which each agent is evaluating
To show that v was assigned by the agent gk we statement p. Even in a relatively well formalized
use notation gk(p) = vk. environment, context is not fully defined. Agent
Definition. A triple g = <N, AR, AA > is called g1 may evaluate p in context “abc”, but agent g2
an agent g if N is label that is interpreted as agent’s in context “ab”. Agent (robot) g2 may have no
name and AR is as set of truth-evaluation actions sensor to obtain signal c or abilities to process and
and AA is a set of non- truth-evaluation actions as- reason correctly with input from that sensor (e.g.,
sociated with name N. For instance agent g called color blind agent with a panchromatic camera).
Professor has a set of truth-evaluation actions AR Definition. Let S be a set propositions, S={p1,
such as grading students’ answer, while delivering p2, …,pn} then set ¬S = {¬p1, ¬p2, …, ¬pn} is
a lecture is in another category AA. called a complementary set of S.
Definition. An agent g is called a reasoning Definition. A set of reasoning agents G is called
agent if g assigns a truth-value v(p) to any proposi- S-only-consistent if agents {g} are consistent
tion p from a set of propositions S and any logical only for propositions in S={p1, p2, …,pn} and are
formula based on S. inconsistent in the complimentary set ¬S.
The actions of a general agent may or may not The evaluations of p is a vector-function
include truth-evaluation actions. From a math- v(p)=(v1(p), v2(p)…,vn(p)) for a set of agents G
ematical viewpoint a reasoning agent g serves that we will represent as follows:
as a mapping,
w1 (υ1 (p) ∨ υ1 (q )) + .. + w N (υN (p) ∨ υN (q ))
µ(p ∨ q ) =
g: p → v(p) N
(1)
While natural agents have both AR, and AA,
artificial software agents and robots may have An example of the logic evaluation by five
only one of them. agent is shown below for p = “A>B”
Definition. A set of reasoning agents G is
called totally consistent for proposition p if any g g ... g
 gn 
v(p) =  1 2 n −1 
agent g from {g} always provides the same truth v1 v2 ... vn −1 vn 
value for p.
In other words, all agents in G are in concord or
logical coherence for the same proposition. Thus, Here A > B is true for agents g1,g3, and g5 and
changing the agent does not change logic value it is false for the agents g2, and g4.
v(p) for a totally consistent set of agents. Here v(p) Kolmogorov’s axioms of the probability theory
is global for G and has no local variability that is are based on a totally consistent set of agents
independent on the individual agent’s evaluation. (for a set of statements) on mutual exclusion of
The classical logic is applicable for such set of elementary events. It follows from the definitions
consistent (rational) agents. below for a set of events E={e1.e2,…,en}.
52
Definition. Set E={e1.e2,…,en} is called a set of  g

 g2 g3 g4 g 5 
f (p) =  1 
elementary events (or mutually exclusive events) true false true false true 
for predicate El if
(2)
∀ ei ej ∈ A El(ei) ∨ El(ej) =True and
El(ei)∧El(ej)=False. For example, four agents using four criteria
can produce v(p) as follows:
In other words, event ei is an elementary event
 g1 g2 ... gn 
(El(ei)=true) if for any j, j≠i events ei and ej cannot 
happen simultaneously, that is probability P(ei ∧ ej) C v1,1 v1,2 ... v1,n 
 1
= 0 and P(ei v ej) = P(ei) + P(ej). Property P(ei∧ej)=0 v(p) = C 2 v2,1 v2,2 ... v2,n 
is the mutual exclusion axiom (ME- axiom).  
 ... ... ... ... ... 
Let S={p1,p2,.., pn} be a set of statements,  
C m vm ,1 vm ,2 ... vm ,n 
where pi=p(ei)=True if and only if event ei is an 
elementary event. In probability theory, p(ei) is
not associated with any specific agent. It is as- Note that introduction of a set of criteria C
sumed to be a global property (applicable to all explains self-conflict, but does not remove it. The
agents). In other words, statements p(ei) are totally agent still needs to resolve the ultimate preference
consistent for all agents. contradiction having a goal, say, to buy only one
Definition. A set of agent S = {g1.g2,…,gn} is and better car. It also does not resolve conflict
called a ME-rational set of agents if among different agents if they need to buy a car
jointly.
∀ ei ej ∈ A, ∀ gi ∈ S, p(ei) ∨ p(ej) =True and If agent g can modify criteria in C making
p(ei)∧p(ej)=False. them consistent for p then g can resolves self-
conflict. The agent can be in a logic conflict state
In other words, these agents are totally consis- because of inability to understand the complex
tent or rational on Mutual Exclusion. Previously context and to evaluate criteria. For example,
we assumed that each agent assigns value v(p) in the stock market environment, some traders
and we did not model this process explicitly. Now quite often do not understand a complex context
we introduce a set of criteria C={C1,C2,…,Cm} by and rational criteria of trading. These traders can
which an agent can decide if a proposition p is true be in logic conflict exhibiting chaotic, random,
or false, i.e., now v(p) = v(p,Ci), which means that and impulsive behavior. They can sell and buy
p is evaluated by using the criterion Ci. stocks, exhibiting logic conflicting states “sell”
Definition Given a set of criteria C, agent g is = p and “buy” = ¬p in the same market situation
in a self-conflicting state if that appears as irrational behavior, which means
that the statement p ∧ ¬p can be true.
∃ Ci, Cj (Ci, Cj ∈ C) & v(p,Ci) ≠ v(p,Cj) A logical structure of self-conflicting states and
agents is much more complex than it is without
In other words, an agent is in a self–conflicting self-conflict. For m binary criteria C1, C2,…Cm
state if two criteria exist such that p is true for one that evaluate the logic value for the same attribute,
of them and false for another one. With the explicit there are 2m possible states and only two of them
set of criteria, the logic evaluation function v(p) is (all true or all false values) do not exhibit conflict
not vector-function any more, but it is expanded between criteria.
to be a matrix function as shown below:
53
Figure 1. A set of 20 agents in the first order of

FRAMEWORK OF FIRST ORDER
logic conflict
OF CONFLICT LOGIC STATE
Definition. A set of agents G is in a first order

of conflicting logic state (first order conflict, for
short) if
∃ gi, gj (gi, gj ∈ G) & v(p,gi) ≠ v(p,gj).
In other words, there are agents gi and gj in G

for which exist different values vi, vj in
 g1 g2 g3 g 4  Fusion Process

C true false false true 
 1
If a single decision must be made at the first
v(p) = C 2 false false true true 
  order of conflict, then we must introduce a fu-
C 3 true false true true 
  sion process of the logic values of proposition p
C 4 false false true true 
 given by all agents. A basic way to do this is to
compute the weighted frequency of logic value
given by all agents:
A set ofagents G is in the first order of conflict if
g
 g2 ... g n-1 g n 
G(A>B) ∩ G(A<B)= ∅ , G(A>B) ≠ ∅, G(A>B) v(p) =  1  (3)
 v1 v2 ... vn-1 vn 
∪ G(A<B)= G.
The following definition presents this idea in where

general terms.  v (p )T w 
 1   1
Definition. A set of agents G is in a First Order    
v2 (p ) w2 
Conflict (FOC) for proposition p if m(p ) = w1v1 (p ) + .... + wn vn (p ) = 



 
 
 ...   ... 
   
v (p ) w 
G(p) ∩ G(¬p)= ∅, and G(p) ≠ ∅, G(p) ∪  n   n 
G(¬p)= G. are two vectors in the space of the agents’ evalu-
ations. The first vector contains all logic states
Figure 1 shows a set of 20 agents in the logic (True/False) for all agents, the second vector (with
conflicting state, where 7 white agents are in the  v (p ) w 
 1   1
   
state True and 13 black agents are in the state False property  v2 ( p )

 w2 
 and   ) contains non-negative
for the same proposition p = “A > B”.  ...   ... 
   
vn ( p ) wn 
Below we show that at the first order of con-    
flicts, AND and OR operations should differ from weights (utilities) that are given to each agent in
the classical logic operations and should be vec- the fusion process. In a simple frequency case,
tor operations in the space of the agents’ evalu- each weight is equal to 1/n. At first glance, μ(p)
ations (agents space). The vector operations reflect is the same as used in the probability and utility
a structure of logic conflict among coherent in- theories. However, classical axioms of the prob-
dividual agent evaluations. ability theory have no references to agents produc-
54
ing initial uncertainty values and do not violate |G(p ∨ q)| = max(|G(p)|, |G(q)|) + |G(¬ p∧ q)|.
the mutual exclusion. Below we define vector (5)
logic operations for the first order of conflict
logic states v(p). If G is a set of agents at the first order of
Definition conflicts and
n |G (p) | ≤ | G (q) |
∑ wk =1
k =1
then ∧ and ∨ logic operations satisfy the follow-
ing properties
v(p ∧ q ) = v1 (p) ∧ v1 (q ),..., vn (p) ∧ vn (q ) ,
|G(p ∧ q)| = min(|G(p)|, |G(q)|) - |G(¬ q ∧ p)|,
v(p ∨ q ) = v1 (p) ∨ v1 (q ),..., vn (p) ∨ vn (q ) |G(p ∨ q)| = max(|G(p)|, |G(q)|) + |G(¬ q ∧ p)|.
and also
where the symbols ∧, ∨, ¬ in the right side of the
equations are the classical AND, OR, and NOT |G(p ∨ q)| = |G(p) ∪G(q)|, G(p ∧ q)| = |G(p) ∩
operations. G(q)|
Below these operations are written with explicit
indication of agents (in the first row): Corollary 1 (min/max properties of ∧ and ∨
operations for nested sets of agents)
v(¬ p) = ¬v1 (p),..., ¬vn (p), If G is a set of agents at the first order of con-
flicts such that G(q) ⊂ G(p) or G(p) ⊂ G(q) then
Below we present the important properties G(¬ p∧ q)| = ∅ or G(¬ q∧ p)| = ∅

of sets of conflicting agents at the first order of
conflicts. Let |G(x)| be the numbers of agents for |G(p ∧ q)| = min(|G(p)|, |G(q)|)
which proposition x is true.
Statement 1 sets up properties of the AND and |G(p ∨ q)| = max(|G(p)|, |G(q)|)
OR operations for nested sets of conflicting agents.
Statement 1 (general non min/max properties This follows from the statement 1. The corol-
of ∧ and ∨ operations) lary presents a well-known condition when the use
If G is a set of agents at the first order of of min, max operations has the clear justification.
conflicts and Let Gc(p) is a complement of G(p) in G: Gc(p)=
G \ G(p), G= G(p) ∪ Gc(p).
|G (q) | ≤ | G (p) | Statement 2. G = G(p) ∪ Gc(p)= G(p) ∪ G (¬ p).
Corollary 2. G (¬ p) = Gc(p)
then ∧ and ∨ logic operations satisfy the follow- It follows directly from Statement 2.
ing properties Statement 3. If G is a set of agents at the first
order of conflicts then
|G(p ∧ q)| = min(|G(p)|, |G(q)|) - |G(¬ p∧ q)|,
(4) G (p ∨ ¬ p) = G(p) ∪ G(¬ p) = G(p) ∪ Gc(p) =
G
55
Figure 2. A set of total 10 agents with two different splits to G(p) and G(q) subsets (a) and (b)
G (p ∧ ¬ p) = G(p) ∩ G(¬ p) = G(p) ∩ Gc(p) = This follows from the properties of the sym-
∅ metric difference ⊕ (e.g.., Flament 1963).
Figure 2 illustrates a set of agents G(p) for
It follows from the definition of the first order which p is true and a set of agents G(q) for which
of conflict and statement 2. In other words, G (p q is true. In Figure 2(a) the number of agents
∧ ¬ p) = ∅ corresponds to the contradiction p ∧ for which truth values of p and q are different,
¬ p, that is always false and G (p ∨ ¬ p)= G cor- (¬ p ∧ q) ∨ (p ∧ ¬q), is equal to 2. These agents
responds to the tautology p ∨ ¬ p, that is always are represented by white squares. Therefore the
true in the first order conflict. distance between G(p) and G(q) is 2. Figure 2(b)
Let G1⊕G2 be a symmetric difference of sets shows other G(p) and G(q) sets with the number
of agents G1 and G2, of the agents for which ¬ p ∧ q) ∨ (p ∧ ¬q is true
G1⊕ G2 = (G1∩ G2c) ∪ (G1c ∩ G2)and let p⊕q equal to 6 (agents shown as white squares and
be the exclusive or of propositions p and q, squares with the grid). Thus, the distance between
p⊕q = (p ∧ ¬q) ∨ (¬ p ∧ q). the two sets is 6.
Consider, a set of agents G(p⊕q). It consists In Figure 2(a), set 2 consists of 2 agents |G((p
of agents for which values of p and q differ from ∧ ¬ q)| = 2 and set 4 is empty, |G((¬ p ∧ ¬ q)| =
each other, that is 0, thus D(Set2, Set4)=2. This emptiness means
G(p⊕q)= G ((p ∧ ¬q) ∨ (¬ p ∧ q)). that a set of agents with true p includes a set of
Below we use the number of agents in set agents with true q.
G(p⊕q) to define a measure of difference between In Figure 2(b), set 2 consists of 4 agents |G((p
statements p and q and a measure of difference ∧ ¬ q)| = 4 and set 4 consists of 2 agents, |G((¬ p
between sets of agents G(p) a G(q). ∧ ¬ q)| = 2, thus D(Set2,Set4)=6.
Definition. A measure of difference D(p,q) These splits produce different distances be-
between statements p and q and a measure of tween G(p) and G(q). The distance in the case (a)
difference D(G(p),G(q)) between sets of agents is equal to 2; the distance in the case (b) is equal
G(p) a G(q) are defined as follows: to 6. Set 1 (black circles) consists of agents for
D(p,q) = D(G(p),G(q))= |G(p)⊕ G(q)| which both p and q are false, Set 2 (white squares)
Statement 4. D(p,q) = D(G(p),G(q)) is a dis- consists of agents for which p is true but q is false.
tance, i.e., it satisfies distance axioms Set 3 (white circles) consists of agents for which
D(p,q) ≥ 0 p and q are true, and Set 4 (squares with grids)
D(p,q) = D(q,p) consists of agents for which p is false and q is true,
D(p,q)+D(q,h) ≥ D(p,h).
56
Figure 3. Graph of the distances for three sen-

Evidence Theory and Rough
tences p1, p2, and p3
Set Theory with Agents
In the evidence theory any subset A ⊆ U of the

universe U is associated with a value m(A) called
a basic assignment probability such that
 0 D 1,2 ... D 1,N 


 
 D 1,2 0 ... D 2,N 
D=  
 ... ... ... .. 
 
D 1,N D 2,N ... 0 
 
For complex computations of logic values Respectively, the belief Bel(Ω) and plausibil-
provided by the agents we can use a graph of the ity Pl(Ω) measures for any set Ω are defined as
distances among the sentences p1, p2, ….,pN. For
example, for three sentences p1, p2, and p3, we 2N
have a graph of the distances shown in Figure 3. ∑ m(A ) = 1

k
This graph has can be represented by a distance k =1
matrix
Thus, Bel measure includes all subsets that


v(p ∧ q ) = 
g1 g2 ... gn −1 gn 

inside Ω, The Pl measure includes all sets with
v1(p) ∧ v1(q ) v2 (p) ∧ v2 (q ) ... vn −1(p) ∧ vn −1(q ) vn (p) ∧ vn (q )
 
non-empty intersection with Ω. In the evidence
g1 g2 ... gn −1 gn
v(p ∨ q ) =  
theory as in the probability theory we associate
v1(p) ∨ v1(q ) v2 (p) ∨ v2 (q ) ... vn −1(p) ∨ vn −1(q ) vn (p) ∨ vn (q )
 g

v(¬p) =  1
g2 ... gn −1 gn 
 one and only one agent with an element.
v1(¬p) v2 (¬p) ... vn −1 (¬p) vn (¬p)
Figure 4 shows set A at the border of the set Ω,
which is divided in two parts: one inside Ω and
which has a general form of a symmetric matrix,
another one outside it. For the belief measure,
 
we exclude the set A, thus we exclude the false
 0 D 1,2 D 1,3  state (f,f) and a logical self-conflicting state (t,f).
 
D= D 1,2
 0 D 2,3 
But for the plausibility measure we accept the (t,
D 0 
 1,3 D 2,3  t) and the self-conflicting state (f,t). In the belief
measure, we exclude any possible self-conflicting
Having distances Dij between propositions we state, but in the plausibility measure we accept
can use them to compute complex expressions in self-conflicting states.
agents’ logic operation. For instance, using 0 ≤ There are two different criteria to compute the
D(p, q) ≤ G(p) + G(q) and belief and plausibility measures in the evidence
theory. For belief, C1 criterion is related to set Ω,
G (p ∧ q) ≡ G (p ∨ q) – G ((¬ p ∧ q) ∨ (p ∧ ¬q)) thus C1 is true for the cases inside Ω. The second
≡ G (p ∨ q) – D(p, q) criterion C2 is related to set A, thus C2 is false for
the cases inside A. Now we can put in evidence
a logically self-conflicting state (t, f), where we
eliminate A also if it is inside Ω . The same is
applied to the plausibility, where C2 is true for
cases inside A. In this situation, we accept a
57
Figure 4. Example of irrational agents for the belief theory
Figure 5. Example of irrational agents for the the same time. We call this locality phenomenon.
Rough sets theory In quantum mechanics, we have non-locality
phenomenon for which the same particle can be in
many different positions at the same time. This is a
clear violation of the ME principle for location. The
non-locality is essential for quantum phenomena.
Given proposition p = “The particle is in position
x in the space”, agent gi can say that p is true, but
agent gj can say that the same p is false.
Individual states of the particle include its posi-
tion, momentum and others. The complete state
of the particle is a superposition of the quantum
states of the particle wi at different positions xi
in the space, This superposition can be a global
logically self-conflicting situation (f, t), where wave function:
cases in Ω are false but cases in A are true. We
also use the logical self-conflict to study the roughs
set as shown in Figure 5.
Bel (Ω) = ∑ m(A ), k
Pl (Ω) = ∑ m(Ak )
Ak ⊆Ω Ak ∩Ω≠∅
Figure 5 shows set Ω that includes the logi-
cally self-conflicting states (f, t).
where y = w1 x 1 + w2 x 2 + ..... + wn x n
The internal and external part of Ω is defined
by a non empty frontier, where the couples [ t, t denotes the state at the position xi (D’Espagnat,
]are present and as well as the self-conflicting 1999). If we limit our consideration by an indi-
couples [ f, t ]. vidual quantum state of the particle and ignore
other states then we are in Mutual Exclusion
situation of the classical logic (the particle is at
QUANTUM MEChANICS: xi or not). In the same way in AUT, when we
SUPERPOSITION AND FIRST observe only one agent at the time the situation
ORDER CONFLICTS collapses to the classical logic that may not ad-
equately represent many-valued logic properties.
The Mutual Exclusive (ME) principle states that Having multiple states associated with the par-
an object cannot be in two different locations at ticle, we use AUT multi-valued logic as a way to
58
model the multiplicity of states. This situation is Explanation:

very complex because measuring position xi of
the particle changes the superposition value ψ. The classical digital computer operates with bits
However, this situation can be modelled by the that is with 1 and 0. The quantum computer oper-
first order conflict, because each individual agent ates with two states nm = n m In quantum
has no self-conflict.
mechanics, the state 1 , 0 is associated with a
Quantum Computer field of probability that the bit assumes the value
1 in the space time. Thus, in the classical sense
The modern concept of the quantum computer the bit is 1 in a given place and time. In quantum
is based on following statements (Abbott, Doer- mechanics, we have the distribution of the prob-
ing, Caves, Lidar, Brandt, Hamilton, Ferry, Gea- ability that a bit assumes the value 1 in the space
Banacloche, Bezrukov, & Kish, 2003; DiVincenzo, time. Therefore we have a field of uncertainty for
2000; DiVincenzo, 1995; Feynman, 1982; Jaeger, value 1 as well as a field of uncertainty for value
2006; Nielsen & Chuang, 2000; Benenti, 2004; 0. This leads to the concept of the qubit.
Stolze & Suter, 2004; Vandersypen, Yannoni, &
Chuang, 2000; Hiroshi & Masahito, 2006): 5. Any space of 2m - 1 dimensions is represented
by H and UH is the unitary transformation
of H by which computations are made in
1. Any state denoted as x i is a field of com-
quantum physics.
plex numbers on the reference space (posi-
tion and time). In our notation, the quantum
state is a column of values for different points Explanation:
(objects).
2. Any combination of states n is a product H is a matrix of qubits that can assume different
of fields. values. For simplicity we use only the numbers 1
and 0, where 1 means qubit that assume the value
Given two atoms with independent states of 1 in the all space time and the same for 0.
energy nm = n m the two atoms have the In quantum mechanics, the qubit value can
be changed only with particular transformations
state n and m that is the product of the
U or unitary transformations. Only the unitary
separate state for atom 1 and atom 2. transformations can be applied. Given the unitary
transformation
3. The space H is the space which dimension
is the number of elementary fields or states.
1
4. In quantum mechanics, we compute the
probability that the qubit takes values 1 or
0 in the all space time. such that UT U = I and
A qubit has some similarities to a classical bit,  0

1 0 0
but is overall very different. Like a bit, a qubit can 0 1 0 0
have two possible values—normally a 0 or a 1. The U = 
0 0 0 1
difference is that whereas a bit must be either 0 or  
1, a qubit can be 0, 1, or a superposition of both.  0 0 1 0

59
Figure 6. Quantum Computer circuit that repre-

with 22 = 4 qubits we have
sents the transformation
0 0

0 1
H =  (1)
1 0
 
1 1

Also
1 0 0 0  0 0  0 0


0 1 0 0  0 1  0 1
UH =  =
0 0 0 1 1 0 1 1
 
0 0 1 0 1 1 1 0
   
connection between unitary transformation and
is the XOR that can be written as c = a ⊕ b. permutation. In fact we have
In quantum mechanics, (1) is represented by
the Deutch circuit shown in Figure 6  0 0  0
Now we want to clarify and extend the Deut-    
 0 1  
ch’s interpretation of the quantum computer to   ⇒  1
 1 0  1
introduce not only a Boolean algebra in the quan-    
   
tum computer, but also a many-valued logic. This  1 1  0
is more in line with the quantum mechanics in- where
trinsic inconsistency due to the superposition of
mutually exclusive states. In quantum mechanics 1 0 0 0 0 0  0 0

a particle can have two opposite spins at the same 0 1 0 0 0 1  0 1
time and different positions again at the same Q = UH =  =
0 0 0 1 1 0 1 1
time. Here we cannot apply the mutual exclusive  
states of the classical physics where we have  0 0 1 0  1 1 1 0
   
logical consistency that is any particle has one and
and only one position.
P
 P2 P3 P4 
Permutations, Unitary U =  1 
 P1 P2 P4 P3 
Transformation and Boolean Algebra
In quantum mechanics any physical transforma- Now for

tion is governed by special transformations of
the Hilbert space of the states. For the unitary ( P1 , P2 , P3 , P4 ) = ( (0,0) , (0,1) , (1,0) , (1,1)
transformation U the probability of the quantum
system is invariant. The unitary transformation
has property U UT = 1 and is extensively used and
in the quantum computer. Now we establish a
60
0  0  0 
    Q2 (X,1 ) =   = false
0   1 0 
Q1 (X,Y) =   = X , Q2 (X,Y ) =  
 1
P P P

¬X ∧ X =  1 2 3
P4 
 = (1, 2)(3, 4)
 1   P2 P1 P4 P3 
  0 
 1   and
false = ¬X ∧ X = {(1, 2)} ∪ {(3, 4)} = {(1, 2),(3, 4)} = Universe
Similarly, for the negation we have

Next we have
 1
Q2 (X,1 ) =   = ¬X 1
0   0 0 0 0 0 0 0
 P P P P  0 1 0 0 0 1 0 1

¬X =  1 2 3 4
 = (3, 4) Q = UH =  =
P1 P2 P4 P3  0 0 1 0 1 0 1 0
 
0 0 0 1 1 1 1 1
   
and
1
0 0  0 0 0 1
 1 0 Q2 (X,1 ) =   = true
1 0 0 0  0 1 0 0 1
Q = UH =  =
0 0 1 0 1 0 1 0
 
0 0 0 1 1 1 1 1 and
   
and
P P
 P3 P4 
¬X ∨ X =  1 2  = ∅
0  P1 P2 P3 P4 
Q2 (X,1 ) =   = X
 1
 P P P P  The quantum computer circuit can be explained
 4
X =  1 2 3
 = (1, 2) as follows. Having
P2 P1 P3 P4 
c = X ⊕ Y = (X ∧ ¬ Y) ∨ (¬ X ∧ Y) we infer
that
for
Y=0⇒X⊕Y=X
¬X = {(1, 2)} = {(3, 4)}
C
Y = 1 ⇒ X ⊕ Y = ¬X
Therefore, Thus, we have what is shown in Figure 7

Now we use a common symbolic representa-
0 1 0 0 0 0  0 1 tion of elementary states

1 0 0 1  0 false = ¬X ∨ X = {(1, 2)} ∩ {(3, 4)} = { } = ∅
0 0 0
Q = UH =  =
0 0 0 1 1 0 1 1 0 as unitary vectors in 2-D Hilbert space
 
 0 0 1 0 1 1 1 0

    1
61
Figure 7. Quantum Computer circuit by NOT

elementary operations and must change our point
operation or “¬ “
of view. The quantum mechanics is based on the
idea of states, superposition of states, product of
states (entanglement or relationship among states)
and the Hilbert space of the states with unitary
transformation. Thus, the mathematical formalism
is closer to the vector space than to the Boolean
algebra. We remark that permutations can be taken
as generators of mathematical groups. This sets up
an interesting link between groups of symmetry
Any vector in this 2-D space is a superposition
and Boolean algebra, which can lead to a deeper
 1  0
1 =   , 0 =   that is symbolically rep- connection with the Boolean logic. In fact, given
 0 1 the permutation with a set of points P in 2-D space,
resented as every permutation of the states can generate only
the following set of Boolean functions
ψ =α 1 +β 0   1   1   0   0
1      1     
1 1   0   0 1 0  1  1
     
   
11 = 1 1 =   ⊗   =   =   , 10 = 1 0 =   ⊗   =     =   ,
   
   
0 0  1   0 0 1  0  0
Thus, 0   
  0   0
     
0   
  1    0
     
  1    0   0    0
0      0     
0 1  0  0 0 0  1   0
α       
01 = 0 1 =   ⊗   =     =   , 00 = 0 0 =   ⊗   =     =  
ψ =  
   
1  0  1  1 1 1  0   0
1
1    
  0  0  1  1
 β             
Points in 2-D Hilbert space for the four states To obtain the Boolean function NAND we
are must extend the space of the objects and introduce
a 3-D attribute space. Thus, we have the Toffoli
 1  0 transformation and the permutation
y = 1 =   , y = 0 =  
0 1
F = { X, ¬X , ¬X ∧ X , ¬X ∨ X ) }
and
For
 0 0  0 0 
   P P2 P3 P4 P5 P6 P7 P8 
 0 1  0 1 
U =  1  = (7, 8)
 P7 
H=  = P1 P2 P3 P4 P5 P6 P8
 1 0   1 0 
    and
 1 1   1 1 1 0 0 0 0 0 0 0
  
0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0
In the classical Boolean algebra, complex 
0 0
expressions are obtained by composition using  0 0 1 0 0 0
U=  
the elementary operations (NOT, AND, and OR). 0 0 0 0 1 0 0 0
 
In quantum computer, we cannot use the same 0 0 0 0 0 1 0 0
 
0 0 0 0 0 0 0 1
 
0 0 0 0 0 0 1 0
 
62
and and
0  0  0    P P8 
0 0 0   P2 P3 P4 P5 P6 P7
      Y=  1  = (1, 2)(3, 4)
0  0   1 0 0 1 P2 P1 P4 P3 P5 P6 P7 P8 
       
0   1 0  0 1 0  or
       
        0 0 0 0 0  0 0 1 
0   1  1 0 1 1   1 0 0 0 0 0
X =   ,Y =   , Z =   , H =  = X Y Z  1
  
 1 0  0   1 0 0     0 0 0 0 0 0 0 0 0 1 0 0 0
        0
 1 0   1  1 0 1  0 0 1 0 0 0 0  0 1 0 0 1 1
        0
 1  1 0  1 1 0  0 1 0 0 0 0 0  0 1 1 0 1 0
        UP=   = 
 1  1  1  1 1 1 0 0 0 0 1 0 0 0 1 0 0  1 0 0
            
0 0 0 0 0 1 0 0 1 0 1  1 0 1
Now we put     
0 0 0 0 0 0 1 0  1 1 0  1 1 0
    
 0  0 0  0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0     
0 1 0 0 0 0 0 0  0 0 1  0 0 1

0 0 1 0 0 0 0 0  0 1 0  0 1 0

    
0 0 0 1 0 0 0 0  0 1 1  0 1 1
=
Q ( X ,Y, Z) = UH= 
0 0 0 0 1 0 0

0 1 0 0  1 0 0

Also
    
0 0 0 0 0 1 0 0 1 0 1  1 0 1
    
0 0 0 0 0 0 0 1  1 1 0  1 1 1 

0

0 1
 

1 1 1 0

  X=0 Y=0 Z=1 0
 0 0 0 0 0 1

1
   
 X=1 Y=0 Z=1 0
Q3  = =Y
 X=0 Y=1 Z=1 1
Thus,    
Z=1 1
 X=1 Y=1
  
 0  0  0
     
 0  0  1
      or
 0  1  0
     
 0  1  1
      P P2 P3 P4 P5 P6 P7 P8 
Q1 (X,Y,Z)=   =X , Q2 (X,Y,Z)=   =Y , Q3 (X,Y,Z)=     = (1, 2)(5, 6)
 1  0  0 X=  1
      P2 P1 P3 P4 P6 P5 P7 P8 
 1  0  1
     
 1  1  1
     
 1  1  0
     
and
Therefore,
0
 1 0 0 0 0 0 0 0 0 0  0 0 1
 X=0 1 0 0 0 0 0 0 0 0 0 1  0 0 0
 Y=0 Z=1 1 
 X=1 0 0 0 0  0 1 0
Y=0 Z=1 1  0 1 0 0 0 0 1
Q3  = = ¬(X ∧ Y ) = X NAND Y     
 X=0 Y=1 Z=1 1 0 0 0 1 0 0 0 0  0 1 1  0 1 1 
    UP=   = 
 X=1 Y=1 Z=1 0 0 0 0 0 0 1 0 0  1 0 0  1 0 1 
       
0 0 0 0 1 0 0 0  1 0 1  1 0 0 
The permutation of (7,8) introduces a zero,     
0 0 0 0 0 0 1 0  1 1 0  1 1 0 
this Q3 (1, 1, 1) = 0 .     
0 0 0 0 0 0 0 1 1 1 1 1 1 1
For the permutation we have     
Q3 (X ,Y , 1) = ¬(X ∧ Y ) = X NAND Y
For the operation X ∧ Y, we have the permu-
tation
63
 X=0 Y=0 Z=1 0 and


 X=1 Y=0 Z=1 1
Q3  = =X X = {(1,2),(5,6)} , X = {(1,2),(3,4)}
 X=0 Y=1 Z=1 0
 X ∨ Y = {(1,2),(5,6)} ∩ {(1,2),(3, 4)} = {(1, 2)}
 X=1 Y=1 Z=1 1
   or
 0 1 0 0 0 0 0 0  0 0 0  0 0 1
   
 1 0 0 0 0 0 0 0  0 0 1   0 0
because     0
 0 0 1 0 0 0 0 0  0 1 0  0 1 0
   
P  0 0 0 1 0 0 0 0    
 P2 P3 P4 P5 P6 P7 P8     0 1 1  0 1 1
X ∧ Y=  1  = (1, 2)(3, 4)(5, 6) UP=   = 
P2 P1 P4 P3 P6 P5 P7 P8   0 0 0 0 1 0 0 0   1 0 0  1 0 0
    
 0 0 0 0 0 1 0 0  1 0 1   1 0 1
    
 0 0 0 0 0 0 1 0   1 1 0  1 1 0
    
or  0 0 0 0 0 0 0 1   1 1 1  1 1 1
     
X = (1,2)(5,6) and Y = (1,2)(3,4)

X ∧ Y= {(1,2),(5,6)} ∪ {(1,2),(3,4)} = {(1,2),(3,4),(5,6)} For the negation operation, we generate a
complementary set of permutations and
and  X=0 Y=0 Z=1 0


 X=1 Y=0 Z=1 1
0
 1 0 0 0 0 0 0 0 0 0  0 0 1 Q3  = = X ∨Y
1 0 0 0 0 0 0 0 0 0 1  0 0 0  X=0 Y=1 Z=1 1
    
0 0 0 0  0 1 1
 0 0 1 0 0 0 1  X=1 Y=1 Z=1 1
  
    
0 0 1 0 0 0 0 0  0 1 1  0 1 0 
UP=   = 
0 0 0 0 0 1 0 0  1 0 0  1 0 1 

0 0 0 0 1 0 0

0  1 0
 
1  1 0 0 
 or
    
0 0 0 0 0 0 1 0  1 1 0  1 1 0 
     P P P P4 P5 P6 P7 P8 
0 0 0 0 0 0 0 1 1 1 1 1 1 1   = {(3, 4)(7, 8)}
     ¬X=  1 2 3
P1 P2 P4 P3 P5 P6 P8 P7 
X = {(1,2),(5,6)}
C
¬X = {(1,2),(5,6)} = {(3,4),(7,8)}
For the operation X ∨ Y, we have the permu-
tation
1 0 0 0 0 0 0 0 0 0 0   0 0 0
 X=0 Y=0 Z=1 0    
 0 1 0 0 0 0 0 0 0 0 1 0 0 1
 X=1 
Y=0 Z=1  0 0 0 0 0 0 1 1
Q3  = = X ∧Y  0 0 1 0 0 0 1
 X=0 Y=1 Z=1  0 0
 0 1 0 0 0 0 0 0 1 1 0 1 0
    UP=   = 
 X=1 Y=1 Z=1 1 0 0 0 0 1 0 0 0  1 0 0  1 0 0
   
0 0 0 0 0 1 0

0  1 0
 
1  1 0 1

    
0 0 0 0 0 0 0 1 1 1 0  1 1 1 
    
and because 0 0 0 0 0 0 1 0 1 1 1 1 1 0
    
P P2 P3 P4 P5 P6 P7 P8 
  = (1, 2)
X ∨ Y=  1
P2 P1 P3 P4 P5 P6 P7 P8  For contradiction we have
64
 X=0 Y=0 Z=1 1 Thus,


 X=1 Y=0 Z=1 0
Q3  = = ¬X P P8 
Z=1 1
 P2 P3 P4 P5 P6 P7
¬(X ∧ Y)=  1  = (7, 8) = XNANDY
 X=0 Y=1 P2 P1 P4 P3 P6 P5 P7 P8 

 X=1 Y=1 Z=1 0 ¬(X ∧ Y)= {(1,2),(3,4),(5,6)} = {(7, 8)}
C
   or

1 0 0 0 0 0 0 0  0 0 0  0 0 0
0 1 0 0 0 0 0 0  0 1  0 0 1
Thus, we have the Unitary transformation 
0 0 1 0 0 0 0
0
 0  0 1 0  0 1 0
    
0 0 0 1 0 0 0 0  0 1 1  0 1 1
UP=   = 
 X=0 Y=0 Z=1 0 0 0 0 0 1 0 0 0  1 0 0   1 0 0
 
0 0 0 0 0 1 0

0  1 0
 
1  1 0 1

 X=1 Y=0 Z=1 0     
Q3 
0 0 0 0 0 0 0 1  1 1 0  1 1 1 
= = ¬X ∧ X     
 X=0 Y=1 Z=1 0 0 0 0 0 0 0 1
 0 1

1 1 1 1 0
  
 
 X=1 Y=1 Z=1 0
  
Now having permutations and rules to compute
AND, OR and NOT we can generate any function
with two variables X and Y. For example,
and
0 1 0 0 0 0 0 0 0 0 0  0 0 1  P P P P P P P P 
  8
1 0 0 0 0 0 0 0 0 0 1  0 0 0 ¬X =  1 2 3 4 5 6 7
 = {(3,4),((7,8)}
  P1 P2 P4 P3 P5 P6 P8 P7 
0 0 0 1 0 0 0 0 0 1 0  0 1 1  P P P P P P P P 
  8
     ¬Y =  1 2 3 4 5 6 7
 = {(5,6),(7,8)}
0 0 1 0 0 0 0 0  0 1 1  0 1 0   P1 P2 P3 P4 P6 P5 P8 P7 
UP=   = 
0 0 0 0 0 1 0 0  1 0 0  1 0 1  ¬X ∨ ¬Y = {(3,4),(7,8)} ∩ {(5,6)(7,8)} = {(7,8)} = ¬(X ∧ Y)
    
0 0 0 0 1 0 0 0  1 0 1  1 0 0 
    
0 0 0 0 0 0 0 1 1 1 0  1 1 1 
    
0 0 0 0 0 0 1 0 1 1 1 1 1 0
     Finally in quantum computer, given set of
This is due to elementary permutation
P
 P2 P3 P4 P5 P6 P7 P8 
 = (1, 2)(3, 4)(5, 6)(7, 8)
Γ = {(1,2), (3,4), (5,6), (7,8)}
X ∧ ¬X =  1
P2 P1 P3 P3 P6 P5 P8 P7 
any Boolean function of two variables is a subset

of Γ.
and Agents theory AUT in the quantum interpreta-
tion of logic. For the quantum mechanics the true
X = {(1, 2),(5, 6)} , ¬X = {(3, 4),(7, 8)} and false value or qbit is given by the elementary
vectors
 P P P P P P P P 
Next by using the De Morgan rule for negation ¬X =  1
 2 3 4 5 6 7 8
 = {(3, 4), (7, 8)}
 P1 P2 P4 P3 P5 P6 P8 P7 
X ∧ ¬X = {(1, 2),(5, 6)} ∪ {(3, 4),(7, 8)} = {(1, 2),(3, 4),(5, 6),(7, 8)}
 P P P P P P P P 
we get Y =  1 2 3 4 5 6 7 8
 = {((1,2),(3,4)}
 P1 P2 P3 P4 P6 P5 P8 P7 
X → Y = ¬X ∨ Y = {(3,4),(7,8)} ∩ {(1,2)(3,4)} = {(3,4)}
¬(X ∧ Y) =(¬X ∨ ¬Y)
65
where y
 1 0   0 1  1
true =   , false =   =    where
 
0   1 1 0 0 
 qagent qagent2 .... qagentn 
v(p) =  1
We define a quantum agent as any entity that  v v2 .... vn 

1 
can define the quantum logic values
ψ = α true + β false
α  vk ∈ {(1,2),(5,6)}
now ψ is function of the parameters  
 β 
 1 and
If these parameters are equal to   then
0  ,
ψ = true v(q) =  1
0   h1 h2 .... hn 

If these parameters are equal to   then
 1 Now we use the previous definition of the
ψ = false logic in quantum mechanics, thus
where false can be spin down and true can be spin
up. Now for different particles a set of logic val- hk ∈ {(1,2),(3,4)}
ues v(p) can be defined by using quantum agents,
where
y = true , y = false
v(p ∧ q) =  1
v
 1 ∧ h v2 ∧ h2 .... vn ∧ hn 
1 
and p is a logic proposition, that is p = “ the par-
ticle is in the state
v 
 1 Now for
v   1 0  
v(p) =  2  where n k ∈ {true, false} =   ,   
 ...  0   1 
    
vk ∧ hk ∈ {(1,2),(5,6)} ∪ {(1,2),(3,4)} = {(1,2),(3,4),(5,6)}
 
n n 
”. For example the particle can be in the state up
where
for the spin or the particle has the velocity v and
so on.
Definition.A qagent is an agent that performs
v(p ∨ q) =  1
a quantum measure. v
 1 ∨ h v2 ∨ h2 .... vn ∨ hn 
1 
The quantum agents or “qagent” and the
evaluation vector can be written in the language
of AUT in this usual way Finally, we have the logic vector
66
Table 1. Logic operation AND by qagents with quantum states 1 and 0
 qagent qagent2   qagent   qagent 

p∧q
 1   1 qagent2
  1 qagent2

1Ù1 , 0 Ù 0 , 1Ù 0 , 0 Ù1 n1 = 1 n2 = 1  n1 = 1 n2 = 0  n1 = 0 n2 = 1 
 qagent   qagent qagent2   qagent qagent2   qagent   qagent 

 1 qagent2
  1   1   1 qagent2
  1 qagent2

n1 = 0 n2 = 0  n1 = 1 n2 = 1  n1 = 1 n2 = 1  n1 = 1 n2 = 0  n1 = 0 n2 = 1 
 qagent   qagent qagent2   qagent qagent2   qagent qagent2   qagent 

 1 qagent2
  1   1   1   1 qagent2

n1 = 0 n2 = 0  n1 = 1 n2 = 0  n1 = 1 n2 = 0  n1 = 1 n2 = 0  n1 = 0 n2 = 0 
 qagent   qagent qagent2   qagent qagent2   qagent   qagent qagent2 

 1 qagent2
  1   1   1 qagent2
  1 
n1 = 0 n2 = 0  n1 = 0 n2 = 1  n1 = 0 n2 = 1  n1 = 0 n2 = 0  n1 = 0 n2 = 1 
 qagent qagent2   qagent qagent2   qagent qagent2   qagent qagent2   qagent qagent2 
 1   1   1   1   1 
n1 = 0 n2 = 0  n1 = 0 n2 = 0  n1 = 0 n2 = 0  n1 = 0 n2 = 0  n1 = 0 n2 = 0 
For instance, for two conflicting qagents we

vk ∨ hk ∈ {(1,2),(5,6)} ∩ {(1,2),(3,4)} = {(1,2)}
have the superposition
m(p • q ) = p • q = w1 S1 + w 2 S 2 + ...... + wn S n
where S1, …,Sn are logic value true and false

defined by logic operation
Thus, the AUT establishes a bridge between
For example we have
the quantum computer based on the Boolean
logic with the many-valued logic and conflicts
v(p • q) =  1 based on the quantum superposition phenomena.
 S1 S2 .... Sn  These processes are represented by fusion process

in AUT. The example of many-valued logic in
for which p • q = p ∨ q, that is represented by the quantum computer is given in Table 1. In the
subset S = { (1, 2)}, S1, …,Sn are different logic table 1 any qagent can assume states
values of the operation union generated by the m(p) = p = w1 S1 + w2 S 2 = w1 1 + w2 0
unitary matrix in quantum mechanics and the for a single proposition. For two different propo-
quantum measure. Now with the agent interpre- sitions p and q, a qagent can assume one of the
tation of the quantum mechanics, we can use the following possible states:
quantum superposition principle and obtain the
aggregation of the qagents logic values. 1 or 0

v(p ∨ q) =  1
Also for the superposition we have Table 2.
 S S2 .... Sn  Now we can assign a fractional value to the
1 
true logic value as shown in Table 3.
67
Table 2. Quantum superposition as logic fusion
 qagent   qagent qagent2   qagent   qagent qagent2 

p∧q
 1 qagent2
  1   1 qagent2
  1 
n1 = 0 n2 = 0  n1 = 1 n2 = 1  n1 = 1 n2 = 0  n1 = 0 n2 = 1 
 qagent qagent2   qagent qagent2 

 1   1  α 1 +β 1 α 1 +β 0 α 0 +β 1
n1 = 0 n2 = 0  n1 = 1 n2 = 1 
 qagent qagent2 
α 0 +β 0  1  α 1 +β 0 α 1 +β 0 α 0 +β 0
n1 = 1 n2 = 0 
α 0 +β 0  1  α 0 +β 1 α 0 +β 0 α 0 +β 1
n1 = 0 n2 = 1 
α 0 +β 0  1  α 0 +β 0 α 0 +β 0 α 0 +β 0
n1 = 0 n2 = 0 
Table 3. Many-valued logic in the quantum system
 qagent qagent2   qagent   qagent qagent2 

p∧q α 0 +β 0  1   1 qagent2
  1 
n1 = 1 n2 = 1  n1 = 1 n2 = 0  n1 = 0 n2 = 1 
 1  True ½ True ½ True False
n1 = 0 n2 = 0 
 1  ½ True ½ True False False
n1 = 1 n2 = 1 
 1  ½ True False ½ True False
n1 = 1 n2 = 0 
 1  False False False False
n1 = 0 n2 = 1 
Correlation (entanglement) in quantum me- than dependence. We can view the quantum cor-
chanics and second order conflict. Consider two relation as a conflicting state because we know
interacting particles as agents. These agents are from quantum mechanics that there is a correlation
interdependent. Their correlation is independent but when we try to measure the correlation, we
from any physical communication by any type of cannot check the correlation itself. The measure-
fields. We can say that this correlation is rather a ment process destroys the correlation. Thus, if the
characteristic of a logical state of the particles spin of one electron is up and the spin of another
68
Figure 8. Many-valued logic in the AUT neural network
Figure 9. Many-valued logic operation AND in the AUT neural network
electron is down the first spin is changed when putation of many-valued logic operations used to
we have correlation or entanglement. It generates model uncertainty process. The traditional neural
the change of the other spin instantaneously and networks model Boolean operations and classical
this is manifested in a statistic correlation differ- logic. In a new fuzzy neural network, we combine
ent from zero. For more explanation see two logic levels (classical and fuzzy) in the same
D’Espagnat (1999). neural network as presented in Figure 8.
Figures 9-11 show that at the first level we
have the ordinary Boolean operations (AND, OR,
NEURAL IMAGE OF AUT and NOT). At the second level, the network
fuses results of different Boolean operations. As
In this section, we show the possibility for a new a result, a many value logic value is generated as
type of neural network based on the AUT. This Figure 8 shows.
type of the neural network is dedicated to com-
69
Figure 10. AUT Many-valued logic operation OR in the AUT neural network
Figure 11.Many-valued logic operation NOT in the AUT neural network
Now we show an example of many-valued In Table 5 we use the aggregation rule that can
logic operation AND with the agents and fusion generate a many-valued logic structure with the
in the neural network. Table 4 presents the indi- following three logic values:
vidual AND operation for a single agent in the nagent nagent 
 2 with equivalent notations
false 
1
population of two agents.  false 
 true false 
Ω = true, = , false
 qagent qagent2   2 2 
 1 
n1 = 0 n2 = 0  Now, having the commutative rule we derive,
false + false
= false
2
With the fusion process in AUT we have the The previous composition rule can be written in
many-valued logic in the neural network. the simple form shown in Table 6 where different
70
Table 4. Agent Boolean logical rule
nagent nagent2 
 nagent nagent2 

  
true  false  true 
1 1 1
p∧q  Agent Agent2 ... AgentN   true   true   false 
V (p ∨ q ) =  1 
u1(p) ∨ u1(q ) u2 (p) ∨ u2 (q ) ... uN (p) ∨ uN (q )

    
false  true  true  false  true 
1 1 1 1 1
 false   true   true   true   false 

    
false  false  false  false  false 
1 1 1 1 1
 false   true   true   true   false 
 nagent nagent  nagent nagent2 

   false 1 2  
false  true  false  true 
1 1 1 1
 false   false   true 
  false   false 
 nagent nagent  nagent nagent  nagent nagent2 
 nagent nagent 
  false 1 2  false 1 2   false 1 2
false  false 
1 1
 false   false 
  false 
  false   false 

Table 5. Neuronal fusion process
true + true true + false true false + true true

p∧q = true = =
true + false
2
=
false + true
2
=
true
2
=
false
2
=
1
2
1
true = false
2 2 2 2 2 2
false + false true + true true + true true + false true false + true true
= false = true = true = =
2 2 2 2 2 2 2
false + false true + false true true + false true true + false true false + false
= false = = = = false
2 2 2 2 2 2 2 2
false + false false + true true false + true true false + false false + true true
= false = = = false =
2 2 2 2 2 2 2 2
false + false false + false false + false false + false false + false
= false = false = false = false = false
2 2 2 2 2
results for the same pair of elements are located for p = ½ true and q = ½ true.
in the same cell. In this case we have no criteria to choose one or
Table 6 contains two different results for the the other. Here the operation AND is not uniquely
AND operation, defined, we have two possible results one is false
and the other is ½ true. The first one is shown in
false + false Table 7 and the second one is shown in Table 8.
= false
2 The neuron image of the previous operation
is shown in Figure 12.
71
Table 6. Simplification of the Table 5
true + true true

p∧q true + false
=
false + true
=
true = true
2 2 2 2 2
false + false true + true true + true true + false true

= false = true = true =
2 2 2 2 2
false + false true false + true true false + true true

= false = =
2 2 2 2 2 2
false + false
= false
2
false + false false + false false + false false + false

= false = false = false = false
2 2 2 2
Table 7. First operation
true + true true

p∧q p ∧q =
true
∧
true 
= false,
true 

= true
2 2  2  2 2

2 2 2 2 2
false + false true false + true true false + false

= false = = false
2 2 2 2 2

2 2 2 2
Table 8. Second operation
false + false true + true true

p∧q = false = true
2 2 2

2 2 2 2 2
false + false true false + true true true

= false =
2 2 2 2 2

2 2 2 2
72
Figure 12. The neuron image of the operation presented in Table 6
CONCLUSION similar way as we did for the quantum mechanics

and quantum computer in this chapter. Now with
In this chapter, we have summarized the Agent- the AUT, it is possible to generate neuron models
based Uncertainty Theory (AUT) and had shown and fuzzy neuron models that deal with intrinsic
how it can model the inconsistent logic values of conflicting situations and inconsistency. The AUT
the quantum computer and represent a new many- opens the opportunities to rebuild previous models
valued logic neuron. of uncertainty with the explicit and consistent
We demonstrated that the AUT modeling of the introduction of the conflicting and inconsistent
inconsistent logic values of the quantum computer phenomena by the means of the many-valued logic.
is a way to solve this old problem of inconsistency
in quantum mechanics and quantum computer with
mutually exclusive states. The general classical ap- REFERENCES
proach is based on states and differential equations
without introduction of elaborated logic analysis. Abbott, A., Doering, C., Caves, C., Lidar, D.,
In the quantum computer, the unitary transforma- Brandt, H., & Hamilton, A. (2003). Dreams
tion has been introduced in the literature as media versus Reality: Plenary Debate Session on
to represent classical logic operations in the form Quantum Computing. Quantum Informa-
of Boolean calculus in the quantum phenomena. tion Processing, 2(6), 449–472. doi:10.1023/
In the quantum computer, superposition gives us a B:QINP.0000042203.24782.9a
new type of logic operation for which the mutual Atanassov, K. T. (1999). Intuitionistic Fuzzy Sets,
exclusion principle is not true. The same particle Physica Verlag. Heidelberg: Springer.
can be in two different positions in the same time.
This chapter explained in a formal way how the Baki, B., Bouzid, M., Ligęza, A., & Mouaddib,
quantum phenomena of the superposition can be A. (2006). A centralized planning technique
formalized by the AUT with temporal constraints and uncertainty for
Many uncertainty theories emerged such as multi-agent systems. Journal of Experimental &
fuzzy set theory, rough set theory, and evidence Theoretical Artificial Intelligence, 18(3), 331–364.
theory. The AUT allows reformulating them in a doi:10.1080/09528130600906340
73
Benenti, G. (2004). Principles of Quantum Com- Ferber, J. (1999). Multi Agent Systems. Addison
putation and Information (Vol. 1). New Jersey: Wesley.
World Scientific.
Feynman, R. (1982). Simulating physics with
Carnap, R., & Jeffrey, R. (1971). Studies in Induc- computers. International Journal of Theoretical
tive Logics and Probability (Vol. 1, pp. 35–165). Physics, 21, 467. doi:10.1007/BF02650179
Berkeley, CA: University of California Press.
Flament, C. (1963). Applications of graphs theory
Chalkiadakis, G., & Boutilier, C. (2008). Se- to group structure. London: Prentice Hall.
quential Decision Making in Repeated Coalition
Gigerenzer, G., & Selten, R. (2002). Bounded
Formation under Uncertainty, In: Proc. of 7th Int.
Rationality. Cambridge: The MIT Press.
Conf. on Autonomous Agents and Multi-agent
Systems (AA-MAS 2008), Padgham, Parkes, Mül- Halpern, J. (2005). Reasoning about uncertainty.
ler and Parsons (eds.), May, 12-16, 2008, Estoril, MIT Press.
Portugal, http://eprints.ecs.soton.ac.uk/15174/1/
Harmanec, D., Resconi, G., Klir, G. J., & Pan,
BayesRLCF08.pdf
Y. (1995). On the computation of uncertainty
Colyvan, M. (2004). The Philosophical Signifi- measure in Dempster-Shafer theory. International
cance of Cox’s Theorem. International Journal Journal of General Systems, 25(2), 153–163.
of Approximate Reasoning, 37(1), 71–85. doi:10.1080/03081079608945140
doi:10.1016/j.ijar.2003.11.001
Hiroshi, I., & Masahito, H. (2006). Quantum
Colyvan, M. (2008). Is Probability the Only Coher- Computation and Information. Berlin: Springer.
ent Approach to Uncertainty? Risk Analysis, 28,
Hisdal, E. (1998). Logical Structures for Repre-
645–652. doi:10.1111/j.1539-6924.2008.01058.x
sentation of Knowledge and Uncertainty. Springer.
D’Espagnat, B. (1999). Conceptual Foundation
Jaeger, G. (2006). Quantum Information: An
of Quantum mechanics (2nd ed.). Perseus Books.
Overview. Berlin: Springer.
DiVincenzo, D. (1995). Quantum Computation.
Kahneman, D. (2003). Maps of Bounded Rational-
Science, 270(5234), 255–261. doi:10.1126/sci-
ity: Psychology for Behavioral Economics. The
ence.270.5234.255
DiVincenzo, D. (2000). The Physical Imple- doi:10.1257/000282803322655392
mentation of Quantum Computation. Experi-
Kovalerchuk, B. (1990). Analysis of Gaines’ logic
mental Proposals for Quantum Computation.
of uncertainty, In I.B. Turksen (Ed.), Proceedings
arXiv:quant-ph/0002077
of NAFIPS ’90 (Vol. 2, pp. 293-295).
Edmonds, B. (2002). Review of Reasoning about
Kovalerchuk, B. (1996). Context spaces as neces-
Rational Agents by Michael Wooldridge. Journal
sary frames for correct approximate reasoning.
of Artificial Societies and Social Simulation, 5(1).
International Journal of General Systems, 25(1),
Retrieved from http://jasss.soc.surrey.ac.uk/5/1/
61–80. doi:10.1080/03081079608945135
reviews/edmonds.html.
Kovalerchuk, B., & Vityaev, E. (2000). Data min-
Fagin, R., & Halpern, J. (1994). Reasoning about
ing in finance: advances in relational and hybrid
Knowledge and Probability. Journal of the ACM,
methods. Kluwer.
41(2), 340–367. doi:10.1145/174652.174658
74
Montero, J., Gomez, D., & Bustine, H. (2007). On Ruspini, E. H. (1999). A new approach to clus-
the relevance of some families of fuzzy sets. Fuzzy tering. Information and Control, 15, 22–32.
Sets and Systems, 16, 2429–2442. doi:10.1016/j. doi:10.1016/S0019-9958(69)90591-9
fss.2007.04.021
Stolze, J., & Suter, D. (2004). Quantum Comput-
Nielsen, M., & Chuang, I. (2000). Quantum Com- ing. Wiley-VCH. doi:10.1002/9783527617760
putation and Quantum Information. Cambridge:
Sun, R., & Qi, D. (2001). Rationality Assumptions
Cambridge University Press.
and Optimality of Co-learning, In Design and
Priest, G., & Tanaka, K. Paraconsistent Logic. Applications of Intelligent Agents (LNCS 1881,
(2004). Stanford Encyclopedia of Philosophy. pp. 61-75). Berlin/Heidelberg: Springer.
http://plato.stanford.edu/entries/logic-paracon-
van Dinther, C. (2007). Adaptive Bidding in
sistent.
Single-Sided Auctions under Uncertainty: An
Resconi, G., & Jain, L. (2004). Intelligent agents. Agent-based Approach in Market Engineering
Springer Verlag. (Whitestein Series in Software Agent Technologies
and Autonomic Computing). Basel: Birkhäuser.
Resconi, G., Klir, G. J., Harmanec, D., & St.
Clair, U. (1996). Interpretation of various un- Vandersypen, L.M.K., Yannoni, C.S., & Chuang,
certainty theories using models of modal logic: I.L. (2000). Liquid state NMR Quantum Comput-
a summary. Fuzzy Sets and Systems, 80, 7–14. ing.
doi:10.1016/0165-0114(95)00262-6
Von-Wun Soo. (2000). Agent Negotiation under
Resconi, G., Klir, G. J., & St. Clair, U. (1992). Hier- Uncertainty and Risk In Design and Applications
archical uncertainty metatheory based upon modal of Intelligent Agents (LNCS 1881, pp. 31-45).
logic. International Journal of General Systems, Berlin/Heidelberg: Springer.
21, 23–50. doi:10.1080/03081079208945051
Wooldridge, M. (2000). Reasoning about Rational
Resconi, G., Klir, G.J., St. Clair, U., & Harmanec, Agents. Cambridge, MA: The MIT Press.
D. (1993). The integration of uncertainty theories.
Wu, W., Ekaette, E., & Far, B. H. (2003). Un-
Intern. J. Uncertainty Fuzziness knowledge-Based
certainty Management Framework for Multi-
Systems, 1, 1-18.
Agent System, Proceedings of ATS http://www.
Resconi, G., & Kovalerchuk, B. (2006). The Logic enel.ucalgary.ca/People/far/pub/papers/2003/
of Uncertainty with Irrational Agents In Proc. ATS2003-06.pdf
of JCIS-2006 Advances in Intelligent Systems
Research, Taiwan. Atlantis Press
Resconi, G., Murai, T., & Shimbo, M. (2000). KEY TERMS AND DEFINITIONS
Field Theory and Modal Logic by Semantic field
to make Uncertainty Emerge from Information. Logic of Uncertainty: A field that deals with
International Journal of General Systems, 29(5), logic aspects of uncertainty modeling.
737–782. doi:10.1080/03081070008960971 Conflicting Agents: Agents that have self-
conflict or conflict with other agents in judgment
Resconi, G., & Turksen, I. B. (2001). Canonical of truth of specific statements.
Forms of Fuzzy Truthoods by Meta-Theory Based Fuzzy Logic: A field that deals with modeling
Upon Modal Logic. Information Sciences, 131, uncertainty based on Zadeh’s fuzzy sets.
157–194. doi:10.1016/S0020-0255(00)00095-5
75
The Agent–Based Uncertainty Theory: Quantum Computer: Used in this chapter

(AUT): A theory that model uncertainty using to denote any systems that provide computations
the concept of conflicting agents. based on qubits.
Neural Network: Used in this chapter to Fusion: Used in this chapter to denote a fu-
denote any type of an artificial neural network. sion of multi-dimensional conflicting judgments
of agents.
76
Section 3
Bio-Inspired Agent-Based
Artificial Markets
78
Chapter 5
Bounded Rationality and
Market Micro-Behaviors:
Case Studies Based on Agent-
Based Double Auction Markets
Shu-Heng Chen
Ren-Jie Zeng
Taiwan Institute of Economic Research, Taiwan
Tina Yu
Memorial University of Newfoundland, Canada
Shu G. Wang
ABSTRACT
We investigate the dynamics of trader behaviors using an agent-based genetic programming system
to simulate double-auction markets. The objective of this study is two-fold. First, we seek to evaluate
how, if any, the difference in trader rationality/intelligence influences trading behavior. Second, besides
rationality, we also analyze how, if any, the co-evolution between two learnable traders impacts their
trading behaviors. We have found that traders with different degrees of rationality may exhibit different
behavior depending on the type of market they are in. When the market has a profit zone to explore, the
more intelligent trader demonstrates more intelligent behaviors. Also, when the market has two learnable
buyers, their co-evolution produced more profitable transactions than when there was only one learn-
able buyer in the market. We have analyzed the trading strategies and found the learning behaviors are
very similar to humans in decision-making. We plan to conduct human subject experiments to validate
these results in the near future.
DOI: 10.4018/978-1-60566-898-7.ch005
Bounded Rationality and Market Micro-Behaviors
INTRODUCTION behaviors. In particular, would a rule create the

opportunity for an auctioneer to engage in unfair
It is not from the benevolence of the butcher, the bidding practices? If so, how can we prevent them
brewer, or the baker that we expect our dinner, from happening?
but from their regard to their own interest. (Adam This type of preventive study is not new in the
Smith, The Wealth of Nations,1776) Internet auction market business. For example, to
prevent “sniping” (the act of submitting a slightly
higher bid than the current one at the last-minute),
In the classic An Inquiry into the Natures and eBay has incorporated software agents in the In-
Causes of the Wealth of Nations, the great econo- ternet bidding process. There, each auctioneer is
mist Adam Smith demonstrated that an individual asked to provide his/her highest bid to an assigned
pursuing his own self-interest also promotes the agent, who then carries out the auction on his/her
good of his community as a whole, through a behalf. By contrast, Amazon adopts a different
principle that he referred to as “invisible hand”. approach by extending the auction period for 10
Since then, the study of individual behaviors in more minutes if a sniper appears at the end of an
a market economy has evolved into the field of auction (Roth & Ockenfels, 2002). This type of
microeconomics. preventive study is important in order to design
In a standard market, buyers and sellers in- fair and successful auction markets.
teract to determine the price of a commodity or In this chapter, we present our work using an
service. During the trading process, individuals agent-based genetic programming (GP) system
maximize their own profits by adopting different (Chen & Tai, 2003) to analyze the behavior of
strategies based on their experiences, familiarity traders with different degrees of rationality in
with the commodity and the information they an artificial double-auction (DA) market. This
acquired. These differences in individual quali- approach is different from that of experimen-
ties in decision-making can also be explained by tal economics (Smith, 1976) in that instead of
the concept of bounded rationality introduced by conducting experiments using human subjects,
Herbert Simon (1997), who pointed out that per- software agents are used to represent traders and
fectly rational decisions are often not feasible in to conduct market simulations under controlled
practice due to the finite computational resources settings. This paradigm of agent-based compu-
available for making them. As a result, humans tational economics complements experimental
employ heuristics to make decisions rather than a economics to advance our knowledge of the
strict rigid rule of optimization. The difference in dynamics of micro market behavior.
human qualities in decision-making is referred to The rest of the chapter is organized as fol-
as the degree of rationality or intelligence. lows. Section 2 explains market efficiency and
In a market that is composed of multiple self- double-auction markets. Section 3 summarizes
interest traders, each of whom has a different related work. In Section 4, the three types of DA
degree of rationality, many unexpected behaviors market we studied are described. Section 5 pres-
may emerge. Our interest in studying the dynamics ents the agent-based GP system used to conduct
of these behaviors is motivated by the increasing our experiments. The analysis of the dynamics of
popularity of Internet auction markets, such as trading behaviors is given in Section 6. Finally,
eBay and Amazon. When designing an auction Section 7 concludes the chapter and outlines our
e-market, in addition to the maximization of future work.
macro market efficiency, the auction rules also
have to consider the dynamics of auctioneers’
79
Figure 1. Demand and Supply Curves of a Market

BACKGROUND
In a standard market environment, the demand for

and supply of a commodity (e.g., cotton, electric-
ity) can be satisfied under different market prices.
The demand curve gives the maximum price that
consumers can accept for buying a given com-
modity, and the supply curve gives the minimum
price at which the producers are willing to sell
that commodity. For example, in Figure 1, the
maximum price that buyers are willing to pay for
the second unit is 26, and the minimum price that
sellers are prepared to accept is 2.
Transactions only take place when the market
price is below the demand curve and above the
supply curve. The area between the demand and
supply curves is the surplus region generated by On the SFTE platform, time is discretized
the transactions. For example, for the second unit, into alternating bid/ask (BA) and buy/sell (BS)
the surplus is 26 − 2 = 24. The distribution of the steps. Initially, the DA market opens with a BA
surplus depends on the transaction price. If the step in which all traders are allowed to simultane-
transaction price is 20, the surplus distributed to ously post bids and asks. After the clearinghouse
the consumer is 26 − 20 = 6, and the rest 18 is informs the traders of each others’ bids and asks,
distributed to the producer. the holders of the highest bid and lowest ask are
In an efficient market, a commodity’s price is matched and enter into a BS step. During the BS
between the demand and supply curves so that all step, the two matched traders perform the transac-
potential surpluses can be fully realized. When the tion using the mid-point between the highest bid
market price is outside the curves, no transaction and the lowest ask as the transaction price. Once
can occur. Consequently, the commodity stays with the transaction is cleared, the market enters into
the producers leaving consumers unsatisfied. This a BA stage for the next auction round. The DA
market is not efficient. A double auction (DA) is market operations are a series of alternating BA
one type of market structure that results in high and BS steps.
market efficiency, and hence is very popular in
the world. For example, the New York Stock
Exchange (NYSE) and the Chicago Mercantile RELATED WORK
Exchange are organized as DA markets. In a DA
market, both buyers and sellers can submit bids and Since the concept of bounded rationality (Simon,
asks. This is in contrasts with a market in which 1997) was introduced more than a decade ago, vari-
only buyers shout bids (as in an English Auction) ous studies on the impact of bounded rationality
or only sellers shout asks (as in a Dutch Auction). in DA markets have been reported. However, the
There are several variations of DA markets. One focus of these works is on macro market efficiency
example is the clearinghouse DA of the Santa Fe instead of the dynamics of traders’ behavior. For
Token Exchange (SFTE) (Rust, Miller, & Palmer, example, Gode and Sunder (1993) conducted ex-
1993) on which this work is based. periments using traders with “zero-intelligence”,
whose bids and offers were randomly generated
80
within their budget constraints (i.e., traders were of the studies are not related to the co-evolution
not permitted to sell below their costs or buy above dynamics of individual behaviors. For example,
their values). The DA market with only zero- (Manson, 2006) implemented bounded rational-
intelligence traders was able to achieve almost ity using GP symbolic regression to study land
100% market efficiency. Based on the results, change in the southern Yucat́an peninsular region
Gode and Sunder argued that the rationality of of Mexico. In that work, each household decision
individual traders accounts for a relatively small maker is represented as a GP symbolic regression.
fraction of the overall market efficiency. To best represent bounded rationality in his prob-
To investigate the generality of Gode and lem domain, the author investigated 5 different GP
Sunder’s result, other researchers have conducted parameter settings: the fitness function, creation
similar experiments using zero-intelligence trad- operator, selection operator, population size and
ers in various types of DA markets. For example, the number of generations.
Cliff and Burten (1997) studied a DA market with Previously, we have used GP to implement
asymmetric supply and demand curves. They bounded rationality to study the co-evolution
found that zero-intelligence traders gave rise to dynamics of traders’ behaviors in an artificial DA
poor market efficiency. They then assigned the market (Chen, Zeng, & Yu, 2009; Chen, Zeng,
traders with the ability to use the closing price & Yu, 2009a). In that study, the market has two
in the previous auction round to determine the types of traders: GP traders who have the ability
current bid. Such traders, which they referred to to learn and improve their trading strategies and
as zero-intelligence-plus, performed better and naive (no-learning ability) truth-telling traders
improved market efficiency. Thus, individual who always present the assigned prices during
traders’ cognitive ability does impact overall an auction. To distinguish the cognitive abilities
market efficiency. of GP traders, different population sizes were
assigned to these traders.
GP to Implement Bounded The rationale of this design decision is based on
Rationality the learning from experience analogy of (Arthur,
1993). In a DA market, a trader’s strategies are
Mapping an evolutionary process to model influenced by two factors: the trader’s original
bounded rationality in human decision-making ideas of how to bid/ask, and the experiences he/
was first proposed in (Arthur, 1993). There, the she learned during the auction process. In GP
author extended the key precept of bounded ratio- learning, the population is the brain that contains
nality – limits to information and cognition – by the possible strategies to be used for the next bid/
positing that learning from experience is important ask. It is therefore reasonable to argue that a GP
in explaining sub-optimality, the creation of heu- trader with a bigger population size has a larger
ristics, and limits to information. During the learn- reservoir to store and process new strategies, and
ing process, individuals improve their strategies hence is more intelligent.
through a Darwin process of learning-by-doing, We have designed two controlled settings to
that balance the path-dependent exploration of new conduct the experiments. In the first setting, there
strategies by extending current strategies versus was only one GP buyer among a group of truth-
simply exploiting existing strategies (Palmer, telling traders. The experimental results show that
Arthur, Holland, LeBaron, & Tayler, 1994). when assigned with a larger population size, the GP
Genetic Programming (GP) (Koza, 1992) is buyer was able to evolve a higher-profit strategy,
one evolutionary system that has been used to which did not exist when the population size was
implement bounded rationality. However, most smaller. Meanwhile, this higher-profit strategy has
81
Figure 2. The 8 Traders’ DA Market

a more sophisticated structure, which combined
two simpler strategies. These results suggest that
when all other traders have no learning ability,
more “intelligent” traders can make more profit.
In the second setting, the market has two GP
buyers, who co-evolve their strategies to outdo
each other and earn more profits. Various strategies
have emerged from this competitive co-evolution
environment. First, the GP buyer who was only
able to evolve the higher-profit strategy under a
large population size became able to evolve the
same strategy with a smaller population size in
the presence of another GP buyer in the market.
with two GP buyers. The first setting allows us
In other words, the competitive environment led
to analyze GP buyers’ learning behaviors under
the GP buyer to learn more. Second, the strategy
stable conditions and the second one is used to
that was most used by the GP buyer in the first
analyze the co-evolution dynamics of the two GP
setting is replaced by a less-profitable strategy
buyers. Figure 2 shows this market environment.
in this setting. In other words, the new GP buyer
Three different markets defined by different
blocked the other GP buyer from learning a more
supply and demand curves are investigated in this
profitable strategy to protect its own profit. Third,
study. Market 1 has its supply and demand curves
when both GP traders were given a larger popula-
intersect multiple prices (see Figure 3). This
tion size (i.e., increased their intelligence), they
market is unique in that the four buyers have 4
learned to use more profitable strategies more
identical token values and the four sellers have
often and both gained more profits.
the same 4 token values (seeTable1). When all
These observed GP learning behaviors make
traders are truth-tellers, only12 of 16 tokens will
intuitive sense. However can they be generalized
be traded. The remaining 4 tokens have their sup-
to all market types? To answer this question, we
ply price (cost) higher than the demand prices,
have devised three less conventional market types
and hence no transaction can take place. Among
to conduct our experiments. The following section
the 12 traded tokens, only 4 transactions generate
describes these three markets.
a profit while the other 8 do not, because the 8
tokens have the same demand and supply prices.
Also, each of the 4 profitable transactions gener-
ThE DA MARKET ENVIRONMENT
ates profit of 4, which is allocated equally to the
buyer (profit 2) and the seller (profit 2). Since
The artificial DA market has 4 buyers and 4sellers,
each trader has only 1 profitable transaction, they
each of whom is assigned 4 private token values
all have the same daily profit of 2.
for trading. For buyers, these are the 4 highest
However what would happen if one or two
prices that they are willing to pay to purchase
buyers were equipped with GP learning ability?
4 tokens and, for sellers, they are the 4 lowest
Are they able to devise strategies that generate a
prices that they are prepared to accept to sell these
daily profit that is greater than 2? The answer to
tokens. All sellers in the market are truth-tellers,
this question will be given in Section 6.1.
who always gave the assigned true token value
In Market II (Figure 4) and III (Figure 5), their
during an auction. For buyers, however, two set-
supply and demand curves do not intersect. In
ups were made: one with one GP buyer and one
82
Figure 3. Market I Demand and Supply Curves Figure 4. Market II Demand and Supply Curves
other words, there is no inequality between supply

and demand. When all traders are truth-tellers, all Figure 5. Market III Demand and Supply Curves
buyers can purchase the tokens they want and all
sellers will sell all the tokens they own. The token
values for Market II are given in Table 2 and the
token values for Market III are given in Table 3.
In Market II, the daily profit of each truth-
telling trader is between 34, 859.5 and 34,863.5.
In Market III, the daily profit is between 11,937.5
and 11,947.5. Would a GP buyer devise more
profitable strategies in these types of markets?
The answers will be given in Sections 6.2 and
6.3.
The following section presents the agent-based
GP system (Chen & Tai, 2003) we used to conduct
our experiments.
• Private information: terminals 12–14;

ThE AGENT-BASED GP SYSTEM
In this implementation, only transaction infor-
In this agent-based GP system, each GP buyer mation on the previous day is provided for GP to
evolves a population of strategies to use during an compose bidding strategies. In our future work,
auction. The auction strategies are represented as we will investigate if more profitable strategies
rules. We provide 3 types of information for GP can be evolved when GP is provided with longer
to construct these rules (see Table 4): memory.
The three types of information are combined
• Past experiences: terminals 1–9 and using logical and mathematical operators to decide
16–17; the bidding prices. Table 5 lists these operators.
• Time information: terminals 10–11;
83
Table 1. Market I Token Value Table
Buyer1 Buyer2 Buyer3 Buyer4 Seller1 Seller2 Seller3 Seller4

79 79 79 79 75 75 75 75
76 76 76 76 76 76 76 76
76 76 76 76 76 76 76 76
75 75 75 75 79 79 79 79
Table 2. Market II Token Value Table

17473 17473 17473 17473 33 34 34 34
17471 17470 17470 17471 34 34 34 34
17465 17465 17465 17465 40 39 39 40
17464 17465 17465 17465 42 42 42 42
Table 3. Market III Token Value Table

10518 10519 10516 10521 622 622 618 619
10073 10072 10071 10071 1013 1010 1014 1016
6984 6981 6985 6987 4102 4101 4100 4100
6593 6593 6589 6590 4547 4548 4545 4550
Each DA market simulation is carried out with tire day to decide the bidding prices. The strategy
a fixed number of GP generations (g), where each might be to pass the round without giving a bid.
generation lasts n (n= 2 × pop_size) days. On each By contrast, a truth-telling trader never passes an
day, 4 new tokens are assigned to each of the auction. A truth-telling buyer bids with the highest
buyers and sellers. The 8 traders then start the value of the tokens it owns while a truth-telling
auction rounds to trade the 16 tokens. A buyer seller asks for the lowest value of the token it has.
will start from the one with the highest price and The same 8 strategies will play for the day’s 25
then move to the lower priced ones while a seller auction rounds, during which a GP trader may give
will start from the one with the lowest price and a different bidding price if the auction strategy
then move to the higher priced ones. The day ends uses information from the previous round/day.
when either all 16 tokens have been successfully The truth-teller, however, will always present the
traded or the maximum number of 25 auction same bid/ask through out the 25 rounds.
rounds is reached. Any un-traded tokens (due to In each auction round, after all 8 traders have
no matching price) will be cleared at the end of presented their prices, the highest bid and the
each day. The following day will start with a new lowest ask will be selected. If there are multiple
set of 16 tokens. buyers giving the same highest bid or multiple
On each day, a GP buyer will randomly select sellers giving the same lowest ask, one of them
one strategy from its population and use it the en- will be selected based on their order, i.e. buyer
84
Table 4. Terminal Set
Index Terminal Interpretation

1 PMax The highest transaction price on the previous day
2 PMin The lowest transaction price on the previous day
3 PAvg The average transaction price on the previous day
4 PMaxBid The highest bidding price on the previous day
5 PMinBid The lowest bidding price on the previous day
6 PAvgBid The average bidding price on the previous day
7 PMaxAsk The highest asking price on the previous day
8 PMinAsk The lowest asking price on the previous day
9 PAvgAsk The average asking price on the previous day
10 Time1 The number of auction rounds left for today
11 Time2 The number of auction rounds that have no transaction
12 HTV The highest token value
13 NTV The second highest token value
14 LTV The lowest token value
15 Pass Pass the current auction round
16 CASK The lowest asking price in the previous auction round
17 CBID The highest bidding price in the previous auction round
18 Constant Randomly generated constant number
Table 5. Function Set where m is the number of tokens traded using

the strategy. Since one strategy is randomly
+, -, *, %, min
selected each day to carry out the auction, after
>, exp, abs, log, max
n= 2 × pop_size days, each strategy in the GP
sin, cos, if-then-else, if-bigger-then-else
population will most likely be selected at least
once and will have a fitness value at the end of
each generation. This fitness value decides how
(seller) 1 will be picked prior to buyer (seller) each strategy will be selected and alternated to
2; buyer (seller) 2 will be picked before buyer generate the next generation of new strategies.
(seller) 3 and so on. If the highest bid is equal to This sampling scheme, where each strategy is
or more than the lowest ask, there is a match and sampled either once or twice, might be too small
the transaction takes place using the average of the to represent the GP learning process. We plan to
bid and ask as the final price. The profit from the carry out more studies in increasing the length of
two strategies (the difference between the transac- the auction period and evaluating the impact on
tion and the given token values) is recorded. The GP learning behaviors.
fitness of the strategy F is the accumulated profit When evaluating a GP-evolved trading strat-
from the traded tokens during the day: egy, it is possible that the output price is outside
the token value range. That is, the GP buyer may
m buy above the value of a token. We might interpret
F = ∑ TokenValuei − TransactionValuei this as a strategic way to win the auction. Since
i =1
85
Table 6. GP Parameters
Parameter Value Parameter Value

tournament size 5 elitism size 1
initialization method grow max tree depth 5
population size (pop_size) 10, 50 no. of days 2 × pop_size
crossover rate 100% subtree mutation 0.5%
no. of generation 200 point mutation 0.45%
no. of runs per setup 90 no. of GP trader 1, 2
the final price is the average of the winning bid To conduct our analysis, we collected all
and ask, the buyer might still make a profit from evolved strategies and their daily profit (F) gen-
the transaction. However, such a risk-taking ap- erated during the last 10 generations of each run.
proach has been shown to make the market un- We consider these strategies to be more “mature”,
stable and to reduce the market efficiency (Chen, and hence to better represent the GP buyers’ trad-
S.-H., & Tai, C.-C., 2003). We therefore enforce ing patterns.
the following rule on the price generated by a GP When the population size is 10, each genera-
evolved strategy: tion is 2 × 10 = 20 days long. On each day, one
if Bid > 2 ×HTV then Bid=HTV strategy is picked randomly from the population to
This rule protects the market from becoming conduct the auction. The total number of strategies
too volatile and also allows GP to evolve rules that used during the last 10 generations is therefore
take on a small amount of risk to generate a profit. 20 × 10 = 200. Since we made 90 runs for this
Table 6 gives the GP parameter values used setup, the number of strategies used to conduct
to perform simulation runs. With 2 different our analysis is 200 × 90 = 18,000.
population sizes (110, 50) and 2 different ways When the population size is 50, each genera-
to assign GP buyers, the total number of setups is tion is 2 × 50 = 100 days long. The total number
4. For each setup, we made 90 runs. The number of auction days (also the number of strategies
of simulation runs made for each market is 360. picked to conduct the auction) during the last
With 3 market types, the total number of simula- 10 generations for all 90 runs is 100 × 10 × 90
tion runs is 1,080. = 90,000. The following subsections present our
analysis of these GP evolved strategies under three
different markets.
RESULTS AND ANALYSIS
Market I
For each market type, we analyze two scenarios:
one GP buyer in the market and two GP buyers One GP Buyer in the Market
in the market. In the first case, the focus is on
how GP population size influences the learning When there is one GP buyer (with population size
of strategies. In the second case, besides popula- 10) in this market, the daily profit (F) generated
tion size, we also investigate the co-evolution by the 18,000 strategies is between -41 and 3.5
dynamics of two GP buyers and its impact on the (see Table 7). Among them, more than 95% of
learning of strategies. the strategies give a profit that is greater than
2, which is better than that produced by a naive
86
Table 7. Market I Evolved Strategies (Pop size 10)
Profit -41 -12 -2.5 -2 0 0.5 1.5 2 2.5 3 3.5 Total

Count 6 8 37 72 378 92 15 181 475 126 16,610 18,000
Table 8. Market I Profit 3.5 Strategies (Pop size 10)
Strategy Profit Count Ratio (Count/18,000)

Length ≥ 2 3.5 1,653 0.0918
NTV 3.5 14,957 0.8309
Total 16,610 0.9228
Table 9. Market I Profit 3.5 Strategies (Pop size 50)
Strategy Profit Count Ratio (Count/90,000)

Length ≥ 2 3.5 5,958 0.0061
NTV 3.5 77,911 0.8657
Total 83,861 0.9318
truth-teller (see Section 4). This indicates that 5), the GP buyer won the 4th auction round and
the GP buyer is more “intelligent” than the naive performed the transaction using the average of
truth- telling buyers. the highest bid (76) and the lowest ask (75), which
The strategies that generate a daily profit of was 75.5. The token value that the GP buyer was
3.5 can be divided into two categories: NTV (the purchasing was 79. So, the profit of this transac-
second highest token value) and those with length tion for the GP buyer is 79 − 75.5 = 3.5. In a
greater than or equal to 2. As shown in Table 8, market where all buyers have the same token
NTV was used to conduct more than 83% of the values, this “waiting after all other buyers have
auction, and we therefore decided to study how purchased their tokens before winning the auction”
it generated the higher profit. is a more profitable strategy.
This strategy is actually quite smart: it bids Did the more “intelligent” (population size 50)
with the second highest token value when all GP buyer devise a better strategy? We examined
other truth-telling buyers bid the highest token all 90,000 strategies but did not find one. Table
price. During the first 3 auction rounds when at 9 shows that the more “intelligent” GP buyer
lease one truth-telling buyer bid the highest token used profit -3.5 strategies slightly more often to
value of 79, the GP buyer, who bid the second conduct the auction (93% vs. 92%). Other than
highest token value of 76, could not win the auc- that, there was no significant difference between
tion. However after the 3 truth-telling buyers the behaviors of the GP buyers with population
purchased their first tokens and each earned a sizes of 10 and 50. This suggests that in a stable
profit of 2, they moved to bid with the next high- (all other traders are truth-tellers) market where
est token value of 76. Since buyer 1, the GP all buyers have the same token values and all sell-
buyer, was preferred when there were multiple ers have the same token values, a small degree
buyers giving the same highest bid (see Section of intelligence is sufficient to devise the optimal
87
Table 10. Market I: Strategies Used by 2 GP Buyers
Population size Buyer Strategy Profit Count Ratio

P10 buyer 1 NTV 4 16,414 0.9119
buyer 2 NTV 3 16,265 0.9036
P50 buyer 1 NTV 4 83,283 0.9254
buyer 2 NTV 3 82,996 0.9222
strategy (the one that generates daily profit of 3.5 purchasing the second token whose value is 76,
is the optimal one in this market). Any increase the profit for this transaction is 76 − 75.5 = 0.5.
in the traders’ intelligence/rationality has no After that, GP buyer 1 did not make any profitable
significant impact on their behaviors. In other transaction and its total daily profit is 4.
words, the relationship between intelligence and The second buyer, who also has GP learning
performance is not visible. ability, only gets to win the auction in round 5
when the first GP buyer has purchased two tokens.
Two GP Buyers in the Market In round 5, GP buyer 1 bids its next highest token
value of 75 (see Table 1) and all other buyers bid
When both buyers 1 and 2 are equipped with GP 76. Buyer 2, a GP buyer, is selected over buyer 3
learning ability, the trading behaviors become and 4 to carry out the transaction using the price
more complicated. Table 10 gives information (76 + 76)/2 = 76 (note that all 4 sellers are trading
about the 2 most used strategies by the 2 GP buy- their second lowest token with a value of 76 as
ers under population sizes of 10 and 50. each has sold its 75 token during the first 4 auc-
It appeared that both GP buyers learned the tion rounds). Since GP buyer 2 is purchasing its
NTV strategy. When they used this strategy to bid first token with value 79, the profit gained in this
against each other, GP buyer 1 earned a daily transaction is 79 − 76 = 3. After that, no market
profit of 4 while GP buyer 2 earned a daily transactions are profitable due to the increase
profit of 3. How did this happen? in seller token prices and the decrease in buyer
We traced the market daily transactions and token prices. The second GP buyer earned a total
found that the bias in the market setup gives GP daily profit of 3.
buyer 1 an advantage over GP buyer 2 who also When the population size of both GP buyers
has an advantage over buyers 3 & 4. During the is increased to 50, Table 10 shows that there is
first 2 auction rounds, each of the two truth-telling no significant difference in their behaviors. This
buyers (who bid 79) won one auction round and might also be due to the market type, as explained
made a profit of 2 by carrying out the transaction previously.
using a price of (79 + 75)/2 = 77. In round 3, all
buyers bid the second highest token value of 76. Market II
However, buyer 1, a GP buyer, is selected, based
on the market setup, to carry out the transaction One GP Buyer in the Market
using the price of (76 + 75)/2 = 75.5. The profit
earned by buyer 1 is therefore 79 − 75.5 = 3.5. In In Market II, the supply and demand curves are
the next auction round, all buyers bid 76 again and almost parallel with each other. A naive truth-
buyer 1 is again selected to carry out the transac- telling strategy would trade all 16 tokens suc-
tion using the price of 75.5. Since GP buyer 1 is cessfully. What kind of strategies would the GP
88
Table 11. Market II (Pop size 10)
Profit 69,705 Strategy Count Ratio

if bigger then else PMinAsk LT abs(sin PMin) Max CASK (- Pass Times2) 1 0.0001
42 22 0.0012
CASK 6,728 0.3738
PMaxAsk 3,428 0.1904
PMin 2,034 0.1130
PMinBid 56 0.0031
Total 12,269 0.6816
buyers evolve under this market environment? the buyer. However, where can a buyer find the
We examined all 18,000 strategies that were sellers’ token values? One source is the sellers’
evolved by the GP buyer with population size 10 asking prices in the auction that took place on
and found that they had 115 different daily profit the previous day (PMaxAsk) or in the previous
values. Among them, the highest profit is 69,705, auction round (CASK). Another source is the
which is much higher than the profit earned by transaction prices of the auction that took place on
the truth-telling strategy (34,863.5). Since strate- the previous day (PMin) or in the previous auction
gies with a profit of 69,705 were used the most round. The GP buyer has learned that knowledge
(68%) during the auction, we decided to analyze and has been able to use the acquired information
how they earned the higher profit. to make the lowest possible bid. Consequently,
Table 11 gives the strategies that produce a its profit is way above the profit made using the
daily profit of 69,705. Among them, the 3 mostly truth-telling strategy.
used strategies are: Did the “smarter” GP buyer (with population
size 50) devise a more profitable strategy for this
• CASK: the lowest asking price in the pre- type of market? We examined all 90,000 strate-
vious auction; gies but did not find one. Table 12 shows that
• PMaxAsk: the highest asking price on the the best strategies are still those that give a daily
previous day; profit of 69,705. However, the “smarter” GP buyer
• PMin: the lowest transaction price on the has used this type of higher-profit strategy more
previous day; frequently (86% vs. 68%) to conduct an auction.
This indicates that in a stable (all other traders are
One common feature of these 3 strategies is truth-tellers) market and the supply and demand
that they all used information from the previous quantities are the same, more intelligent GP buy-
transactions (either on the previous day or in the ers exhibit more intelligent behavior by using the
last auction) to decide the current bidding price. higher-profit strategies more frequently.
This is actually a very wise strategy in this type
of market where the quantities of supply and Two GP Buyers in the Market
demand are equal (16). Under such conditions, a
buyer can bid any price to win 4 auction rounds When both buyers 1 and 2 are equipped with
as long as the price is above the sellers’ token GP learning ability, the market becomes more
values. So, the closer the bidding price is to the competitive as both of them devise strategies to
sellers’ token value, the higher the profit is for outdo each other to make a profit. Under such a
89
Table 12. Market II (Pop size 50)
Profit 69,705 strategies Count Ratio (Count/90,000)

Length ≥ 2 2,995 0.0333
42 11 0.0001
CASK 49,196 0.5466
PMaxAsk 10,652 0.1184
PMin 11,452 0.1272
PMinBid 35,58 0.0395
Total 77,864 0.8652
Table 13. Market II: Strategies Used by GP Buyers
Population size Buyer Profit Count Ratio Total

P10 buyer 1 69,705 5,079 0.2822
69,710 3,903 0.2168
69,715 5,454 0.3030 0.8020
buyer 2 69,705 9,885 0.5492
69,710 4,395 0.2442 0.7933
P50 buyer 1 69,705 26,480 0.2942
69,710 32,174 0.3575
69,715 16,281 0.1809 0.8326
buyer 2 69705 51,756 0.5751
69,710 19,739 0.2193 0.7944
competitive environment, we found that both GP tion information, such as CASK, PMaxAsk and
buyers evolved more profitable strategies than PMin, to make more profitable bids. Depending
were evolved when the market only had one GP on which of these strategies were used against
buyer. Table 13 gives the strategies evolved by the each other, GP buyers 1 and 2 each earned a dif-
two GP buyers with population sizes 10 and 50. ferent amount of profit. If both GP buyers used
Although both GP buyers evolved higher- CASK, they would give the same bids. Under the
profit strategies, GP buyer 1 evolved one group market mechanism where buyer 1 was preferred
of strategies that GP buyer 2 did not evolve: the over buyer 2, GP buyer 1 won the bid and earned
group that gives the highest profit of 69,715. the higher profit (69,715 vs. 69,705). However, if
Moreover, GP buyer 1 applied the two higher one GP buyer used PMaxAsk and the other used
profit strategies (with profits 69,710 and 69,715) CASK, the one that used PMaxAsk would earn
more frequently than GP buyer 2 did. This sug- more profit (69,710 vs. 69,705). This is because
gests that GP buyer 1 won the competition during PMaxAsk bid the highest token price of the
the co-evolution of bidding strategies in this sellers (42) while CASK bid the lowest asking
market. Was this really the case? price of the sellers in the previous auction round,
We examined all strategies and found that which can be 33, 34, 39, 40 or 42 (see Table 2).
both GP buyers have learned to use past transac- Consequently, PMaxAsk won the auction and
90
Table 14. Market III: Most Used Strategies
Population size Strategy Profit Count Ratio Total

P10 CASK 15,978 14,882 0.8268 0.8268
P50 Length ≥2 15,978 4331 0.0481
CASK 15,978 71091 0.7899 0.8380
earned the higher profit. Table 13 shows that GP complex structures, such as Max CASK CASK
buyer 1 used the profit 69,715 strategies most and If bigger then else CASK CASK CASK
frequently. This indicates that GP buyer 1 won CASK. This is an understandable behavior change
the competition due to the advantage it received since a larger population size gives GP more room
from the market setup bias. to maintain the same profit strategies with more
When the population size of the two GP buy- diversified structures.
ers was increased to 50, the market setup bias no
longer dominated the market dynamics. Instead, Two GP Buyers in the Market
GP buyer 2 started to use PMaxAsk more often
against GP buyer 1’s CASK and earned the higher Similar to the strategies in Market II, when both
profit. As a result, the frequency with which GP buyers 1 and 2 were equipped with GP learning
buyer 1 earned profit of 69,715 was reduced (30% ability, they both evolved strategies that earned
to 18%). Again, more intelligent GP buyers ex- more profit than that evolved when there was
hibited different behavior under the co-evolution only 1 GP buyer in the market. Another similar-
setting in this market. ity is that GP buyer 1 evolved the strategies that
earned the highest profit of 17,765, which GP
Market III buyer 2 did not evolve. Table 15 gives the most
used strategies by the 2 GP buyers with population
One GP Buyer in the Market sizes of 10 and 50.
The strategies also have similar dynamics to
Market III is similar to Market II in that the that in Market II. When both GP buyers used
quantities of supply and demand are equal (16). CASK, GP buyer 1 had an advantage and earned
Did the GP buyer learn to use information from 17,765 while GP buyer 2 earned 15,975. When
the previous auction to obtain the sellers’ token one GP buyer used CASK and the other used
price and make the lowest bid to earn the most PMaxAsk, the one that used PMaxAsk earned a
profit? Table 14 gives the most used strategies higher profit (16,863,5 vs. 15,978).
by the GP buyer with population size 10 and However, in this market, GP buyer 1 only used
50. It is clear that the GP buyer has learned that the strategies that earned 17,765 in 15% of the
knowledge. The most frequently used strategy auction when both GP buyers had a population size
is CASK, which earned a daily profit of 15,978. of 10. This indicates that the market mechanism
This profit is much higher than the profit earned bias could not make buyer 1 win the competitive
by the truth-telling strategy (11,947.5). co-evolution in this market. Other factors, such
The more intelligent GP buyer (who had a as the supply and demand prices of the 16 tokens,
population size of 50) developed a similar style also influenced the co-evolution dynamics.
of strategies that gave a profit of 15,978. How- Another type of GP buyers’ behavior, which
ever, a small number of these strategy had more was different from that in Market II was that when
91
Table 15. Market III: Strategies of the 2 GP Buyers
Population size Buyer Profit Count Ratio Total

P10 buyer 1 15,978 4,079 0.2266
16,866.5 2,050 0.1139
17,765 2,809 0.1561 0.4966
buyer 2 15,975 7,754 0.4308
16,863.5 853 0.0474 0.4782
P50 buyer 1 15,978 32,764 0.3640
16,866.5 9,666 0.1074
17,765 26,300 0.2922 0.7637
buyer 2 15,975 38,051 0.4228
16,863.5 18,945 0.2105 0.6333
the population size was increased to 50, both GP dynamics in the devised artificial DA market
buyers increased the usage of the higher-profit resembles the dynamics of real markets. We will
strategies. In other words, more intelligent GP buy- continue to investigate the market dynamics when
ers learned to co-operate with each other, instead both buyers and sellers have GP learning ability.
of competing with each other, which seems to be Our analysis of the GP-evolved strategies
the behavior of the two GP buyers in Market II. shows that individual GP buyers with different
More intelligent GP buyers also exhibited different degrees of rationality may exhibit different be-
behaviors in this market. havior depending on the type of market they are
in. In Market I where all buyers have the same
token values and all sellers have the same token
CONCLUDING REMARKS values, the behavioral difference is not significant.
However, in Markets II & III where the supply
In all three markets we have studied, the co- and demand prices have room to exploit a higher
evolution of two self-interested GP buyers has profit, more intelligent GP buyers exhibit more
produced more profitable transactions than when intelligent behavior, such as using higher-profit
there was only one GP buyer in the market. This strategies more frequently or cooperating with
phenomenon was also observed in our previous each other to earn more profits. In (Chen, Zeng, &
work (Chen, Zeng, & Yu, 2009; Chen, Zeng, & Yu, 2009; Chen, Zeng, & Yu, 2009a), a similar GP
Yu, 2009a): the overall buyer profit increases as buyer behavioral difference in the market studied
the number of GP buyer increases in the market was reported. This suggests that the intelligent
studied. In other words, an individual pursuing his behavior of a GP trader becomes visible when the
own self-interest also promotes the good of his market has a profit zone to explore.
community as a whole. Such behavior is similar to All of the observed individual traders’ learning
that of humans in real markets as demonstrated by behaviors make intuitive sense. Under the devised
Adam Smith. Although we have only studied the artificial DA market platform, GP agents demon-
case where only buyers have GP learning ability, strate human-like rationality in decision-making.
this result suggests that to some degree, the GP We plan to conduct human subject experiments to
trader agents have similar qualities to humans in validate these results in the near future.
decision-making. Meanwhile, the co-evolution
92
REFERENCES Koza, J. R. (1992). Genetic Programming: On the

Programming of Computers by Means of Natural
Arthur, W. B. (1993). On designing economic Selection. MIT Press.
agents that behave like human agents. Journal of
Evolutionary Economics, 3, 1–22. doi:10.1007/ Manson, S. M. (2006). Bounded rationality in
BF01199986 agent-based models: experiments with evolu-
tionary programs. International Journal of Geo-
Chattoe, E. (1998). Just how (un)realistic are graphical Information Science, 20(9), 991–1012.
evolutionary algorithms as representations of doi:10.1080/13658810600830566
social processes? Journal of Artificial Societies
and Social Simulation, 1. Palmer, R. G., Arthur, W. B., Holland, J. H., LeB-
aron, B., & Tayler, P. (1994). Artificial economic
Chen, S.-H., & Tai, C.-C. (2003). Trading restric- life: a simple model of a stock market. Physica
tions, price dynamics and allocative efficiency D. Nonlinear Phenomena, 75(1-3), 264–274.
in double auction markets: an analysis based on doi:10.1016/0167-2789(94)90287-9
agent-based modeling and simulations. Advances
in Complex Systems, 6(3), 283–302. doi:10.1142/ Roth, A. E., & Ockenfels, A. (2002). Last-minute
S021952590300089X bidding and the rules for ending second-price auc-
tion: evidence from Ebay and Amazon auctions
Chen, S.-H., Zeng, R.-J., & Yu, T. (2009). Co- on the Internet. The American Economic Review,
evolving trading strategies to analyze bounded 92, 1093–1103. doi:10.1257/00028280260344632
rationality in double auction markets . In Riolo,
R., Soule, T., & Worzel, B. (Eds.), Genetic Pro- Rust, J., Miller, J., & Palmer, R. (1993). Behavior
gramming: Theory and Practice VI (pp. 195–213). of trading automata in a computerized double auc-
Springer. doi:10.1007/978-0-387-87623-8_13 tion market . In Friedmand, D., & Rust, J. (Eds.),
The Double Auction Market: Institutions, Theories
Chen, S.-H., Zeng, R.-J., & Yu, T. (2009a). Analy- and Evidence (pp. 155–198). Addison-Wesley.
sis of Micro-Behavior and Bounded Rationality in
Double Auction Markets Using Co-evolutionary Simon, H. A. (1997). Behavioral economics and
GP . In Proceedings of World Summit on Genetic bounded rationality . In Simon, H. A. (Ed.), Models
and Evolutionary Computation. ACM. of Bounded Rationality (pp. 267–298). MIT Press.
Cliff, D., & Bruten, J. (1997). Zero is not enough: Smith, V. (1976). Experimental economics:
On the lower limit of agent intelligence for con- induced value theory. The American Economic
tinuous double auction markets (Technical Report Review, 66(2), 274–279.
HP-97-141). HP Technical Report.
Edmonds, B. (1998). Modelling socially intel-
ligent agents. Applied Artificial Intelligence, 12, KEY TERMS AND DEFINITIONS
677–699. doi:10.1080/088395198117587
Agent-based Modeling: A class of compu-
Gode, D. K., & Sunder, S. (1993). Allocative tational models for simulating the actions and
efficiency of markets with zero-intelligence trad- interactions of autonomous agents (both individual
ers: markets as a partial substitute for individual or collective entities such as organizations or
rationality. The Journal of Political Economy, groups) with a view to assessing their effects on
101, 119–137. doi:10.1086/261868 the system as a whole.
93
Genetic Programming (GP): An evolution- Co-evolution: “The change of a biological

ary algorithm-based methodology inspired by or artificial agents triggered by the change of a
biological evolution to find computer programs related agent”
that perform a user-defined task. Double Auction: A process of buying and sell-
Bounded Rationality: A concept based on the ing goods when potential buyers submit their bids
fact that rationality of individuals is limited by the and potential sellers simultaneously submit their
information they have, the cognitive limitations ask prices to an auctioneer, and then an auctioneer
of their minds, and the finite amount of time they chooses some price p that clears the market: all the
have to make decisions. sellers who asked less than p sell and all buyers
who bid more than p buy at this price p.
94
95
Chapter 6
Social Simulation with
Both Human Agents and
Software Agents:
An Investigation into the
Impact of Cognitive Capacity
on Their Learning Behavior
Shu-Heng Chen
Chung-Ching Tai
Tunghai University, Taiwan
Tzai-Der Wang
Cheng Shiu University, Taiwan
Shu G. Wang
Chengchi University, Taiwan
ABSTRACT
In this chapter, we will present agent-based simulations as well as human experiments in double auc-
tion markets. Our idea is to investigate the learning capabilities of human traders by studying learning
agents constructed by Genetic Programming (GP), and the latter can further serve as a design platform
in conducting human experiments. By manipulating the population size of GP traders, we attempt to
characterize the innate heterogeneity in human being’s intellectual abilities. We find that GP traders
are efficient in the sense that they can beat other trading strategies even with very limited learning ca-
pacity. A series of human experiments and multi-agent simulations are conducted and compared for an
examination at the end of this chapter.
DOI: 10.4018/978-1-60566-898-7.ch006
Social Simulation with Both Human Agents and Software Agents
INTRODUCTION markets. So, what is the unique property of human

learning behavior in double auction markets?
The double auction is the core trading mechanism For the above question, Rust, Miller, & Palmer
for many commodities, and therefore a series of (1994) speculated that human traders do not
human experiments and agent-based simulation outperform software strategies because they are
studies have been devoted to studying the price constrained by their computational capacity, but
formation processes or to looking for effective they do have the advantage of being adaptive to
trading strategies in such markets. Among these a wide range of circumstances:
studies, experiments on human-agent interactions
such as Das, Hanson, Kephart, & Tesauro (2001), “The key distinction is adaptivity. Most of the
Taniguchi, Nakajima, & Hashimoto (2004), and programs are ‘hardwired’ to expect a certain
Grossklags & Schmidt (2006), as well as comput- range of trading environments: if we start to
erized trading tournaments such as Rust, Miller, move out of this range, we would expect to see a
& Palmer (1993, 1994) have exhibited a general serious degradation in their performance relative
superiority of computerized trading strategies over to humans. … Anyone who has actually traded in
learning agents, where the learning agents may one of these DA markets realizes that the flow of
stand for learning algorithms or human traders. events is too fast to keep close track of individual
In Rust, Miller, & Palmer (1993, 1994)’s trad- opponents and do the detailed Bayesian updat-
ing program tournaments, adaptive trading strate- ing suggested by game theory.” (Rust, Miller, &
gies did not exhibit human-like “intuitive leaps” Palmer, 1994, p.95)
that human traders seem to make in conjecturing
good strategies based on limited trading experi- Thus, learning ability constrained by compu-
ences, although Rust, Miller, & Palmer (1993, tational capacity could be not only an explanation
1994) also expected that most human traders of how human traders are different from software
will not be able to outperform software trading strategies, but also the source of heterogeneity
strategies because of their limited computational observed among human decision makers.
capability. Obviously, unless the innate property of human
Rust, Miller, & Palmer (1993, 1994)’s conjec- learning behavior is captured and well character-
ture can be evident when a series of human-agent ized in software agents, we cannot build a suffi-
interaction experiments is conducted. In Das, ciently adequate model to describe what happens in
Hanson, Kephart, & Tesauro (2001), Taniguchi, real auction markets, let alone move on to evaluate
Nakajima, & Hashimoto (2004), and Grossklags alternative market institutions with agent-based
& Schmidt (2006)’s studies, in most of the situ- systems populated by autonomous agents. As a
ations, human traders cannot compete with their result, agent-based computational economists
software counterparts. It seems that human trad- have contributed much in discovering or inventing
ers could learn. However, due to some uncertain proper algorithms based on human experiments
limitations, they just cannot win. to describe human learning processes. However,
The ineffectiveness of learning behavior in besides Casari (2004), not much has been done
double auction markets raises an interesting to consider the impact of cognitive capacity on
question: If learning is ineffective, then it implies human traders’ learning ability.
that all human trading activities in double auction Therefore, in this chapter we initiate a series
markets should have been replaced by trading of experiments to test the possibility of construct-
programs since programs can perform better and ing learning agents constrained by their cognitive
more quickly. However, this is not the case in real capacity. We achieve this by modeling learning
96
agents with Genetic Programming (GP) and chapter or provided the foundation of our research
then we manipulate their “cognitive capacity” or method. First, we will go through several double
“computational capacity” by assigning GP traders auction experiments to support our research ques-
with populations of different sizes. tion. Second, we will conduct a brief survey of
Unlike common practices, we do not construct how cognitive capacity is found to be decisive
our agents based on the results of human experi- in human decision making. In the end, we will
ments because human factors are so difficult to talk about how to model cognitive capacity in
control and observe, and therefore it might not be agent-based models.
easy to elicit definitive conclusions from human
experiments. Instead, we adopted the Herbert Trading Tournaments in
Simon way of studying human behavior—un- Double Auction Markets
derstanding human decision processes by con-
ducting computer simulations. Thus agent-based The pioneering work in exploring individual char-
simulations in this chapter are not only research acteristics of effective trading strategies in double
instruments used to test our conjecture, but they auction markets consists of Rust, Miller, & Palmer
also serve as the design platform for human ex- (1993, 1994)’s tournaments held in the Santa Fe
periments. Institute. Rust, Miller, & Palmer (1993, 1994)
We first model learning agents with Genetic collected 30 trading algorithms and categorized
Programming, and the population sizes of GP trad- them according to whether they were simple or
ers are regarded as their cognitive capacity. These complex, adaptive or non-adaptive, predictive or
GP traders are sent to the double auction markets non-predictive, stochastic or non-stochastic, and
to compete with other designed strategies. With the optimizing or non-optimizing.
discovery of the capability of GP learning agents, Rust, Miller, & Palmer (1993, 1994) con-
we further conduct human experiments where ducted the double auction tournament in a very
human traders encounter the same opponents as systematic way. They proposed a random token
GP agents did. By comparing the behavior and generation process to produce the demand and
learning process of human traders with those of supply schedules needed in their tournaments. A
GP agents, we have a chance to form a better large amount of simulations which cover various
understanding of human learning processes. kinds of market structures were performed, and
This chapter is organized as follows: Section 2 an overall evaluation was made to distinguish
will introduce related research to supply the back- effective strategies from poor ones.
ground knowledge needed for this study. Section The result was rather surprising: the win-
3 depicts the experimental design, including the ning strategy was simple, non-stochastic, non-
trading mechanism, software trading strategies, predictive, non-optimizing, and most importantly
the design of GP learning agents, and experimental non-adaptive. In spite of this, other strategies
settings. The results, evaluations, and analysis of possessing the same characteristics still performed
the experiments are presented in Section 4. Section poorly. As a result, it remains an open question
5 provides the concluding remarks. “whether other approaches from the literature on
artificial intelligence might be sufficiently power-
ful to discover effective trading strategies.” (Rust,
LITERATURE REVIEW Miller, & Palmer, 1994, pp. 94–95)
It is important to note that there are certain
In this section, we will present a series of related sophisticated strategies in Rust, Miller, & Palmer
studies, which inspired the research behind this (1993, 1994)’s tournaments, and some of them
97
even make use of an artificial intelligence algo- moto (2004) trained their human subjects with
rithm as the learning scheme. Compared to simple knowledge about the futures and stock markets,
strategies, such learning agents did not succeed in related technical and fundamental trading strate-
improving their performance within a reasonable gies, and the operations of trading interface for
period of time. Therefore Rust, Miller, & Palmer 90 minutes. In addition to this arrangement, the
(1994) deemed that humans may perform better trading mechanism of U-Mart, named “Itayose,”
because they can generate good strategies based is special in that the market matches the outstand-
on very limited trading experiences. ing orders every 10 seconds. As a result, human
The comparisons of learning agents versus de- traders have more time to contemplate and make
signed strategies have assumed a different form in their bids or offers. Both of the above designs
a series of human-agent interaction studies. Three enable human traders to have more advantages
projects will be introduced here, including Das, to compete with the randomly bidding software
Hanson, Kephart, & Tesauro (2001), Taniguchi, strategies.
Nakajima, & Hashimoto (2004), and Grossklags However, the results show that human traders
& Schmidt (2006). have poorer performance than software agents,
Das, Hanson, Kephart, & Tesauro (2001) em- although there is a human trader who learns to
ployed a continuous double auction market as the speculate and can defeat software strategies. In
platform and had human traders compete with two spite of their results, Taniguchi, Nakajima, &
software trading strategies—ZIP and GD. ZIP is Hashimoto (2004)’s experiments still exhibit the
an adaptive strategy proposed by Cliff & Bruten possibility of defeating software strategies with
(1997), and a GD strategy is proposed by Gjerstad human intelligence.
& Dickhaut (1998). In order to investigate the Unlike previous studies, Grossklags & Schmidt
potential advantage of software strategies due (2006)’s research question is more distinguish-
to their speed, Das, Hanson, Kephart, & Tesauro ing: they want to know whether human traders
(2001) distinguished fast agents from slow agents will behave differently when they know there
by letting slow agents ‘sleep’ for a longer time. are software agents in the same market. In their
Human traders encounter three kinds of oppo- futures markets, they devised a software agent
nents, namely GD Fast, ZIP Fast, and ZIP Slow called “Arbitrageur.” Arbitrageur’s trading strat-
opponents in Das, Hanson, Kephart, & Tesauro egy is simple: sell bundles of contracts when their
(2001)’s experiments. prices are above the reasonable price, and buy
The results show that regardless of whether bundles of contracts when their prices are below
the agents are fast or slow, they all surpass hu- the reasonable price. This is a very simple strategy
man traders and keep a very good lead. Although which human traders may also adopt. However,
human traders seem to improve over time, they software agents make positive profits in 11 out
still cannot compete with software strategies at of the total of 12 experiments. Because this is
the end of the experiments. a zero-sum game, the software agents’ positive
The superiority of software strategies is further performance means that losses are incurred by
supported by Taniguchi, Nakajima, & Hashimoto human traders, although the differences are not
(2004)’s futures market experiments. Taniguchi, statistically significant.
Nakajima, & Hashimoto (2004) use the U-Mart Similar to Rust, Miller, & Palmer (1993,
futures market as the experimental platform where 1994)’s results, Grossklags & Schmidt (2006)’s
human traders and random bidding agents compete results, together with Das, Hanson, Kephart,
to buy or sell contracts at the same time. Before & Tesauro (2001)’s and Taniguchi, Nakajima,
the experiments, Taniguchi, Nakajima, & Hashi- & Hashimoto (2004)’s findings, demonstrate a
98
general picture in which it is difficult for human humans are boundedly rational, they cannot find
traders to compete with software agents even if the optimal solutions at the beginning and have
the software agents are very simple. Learning to improve their performance based on their
agents (either software ones or humans) can experiences.
hardly defeat designed strategies in a short period Information and cognitive capacity are two
of time. Nevertheless, we can also observe from important sources of bounded rationality to human
these experiments that learning agents (either decision makers. While economists, either theo-
software or human) may have the chance to defeat rists or experimentalist, have mainly emphasized
software strategies if they have enough time to the importance of information, the significance of
learn. Then some questions naturally emerge: in cognitive capacity has been temporarily mislaid
what situations can learning agents outperform but has started to regain its position in economics
other software strategies? Is there any mechanism experiments in recent years.
which has an influence on learning agents’ learn- Some of the earliest experimental ideas con-
ing behavior? cerning cognitive capacity came from Herbert
To answer the first question, we need to conduct Simon, who was the initiator of bounded rational-
experiments where learning agents can exert all ity and was awarded the Nobel Memorial Prize
their potential to win the game. By considering in Economics. In the “concept formation” experi-
the cost of testing human traders in various situ- ment (Gregg & Simon, 1979) and the arithmetic
ations with long time horizons, we adopt another problem (Simon, 1981), Simon pointed out that
approach: we conduct agent-based simulation the problem is strenuous or even difficult to solve,
with learning agents to examine their winning not because human subjects did not know how to
conditions first, and then we run human experi- solve the problem, but mainly because without
ments to see how things develop when learning decision supports such as paper and pencil, such
software traders are replaced by learning humans. tasks can easily overload human subjects’ working
In selecting an appropriate algorithm to model our memory capacity and influence their performance
learning agent, we choose Genetic Programming (Simon, 1981, 1996).
because as Chen, Zeng, & Yu (2008)’s research In the realm of psychology, Payne, Bettman, &
shows, GP traders can evolve and adapt very ef- Johnson (1993)’s research can be a good founda-
ficiently in a double auction market. tion for our research. Payne, Bettman, & Johnson
(1993) pointed out that humans have different
Cognitive Ability and strategies for solving a specific problem, and
Learning Behavior humans will choose the strategy by considering
both accuracy and the cognitive effort they are
The answer to the second question raised at the going to make. In the end, it is dependent on the
end of the last section is not so obvious. To find cognitive capacity of human decision makers.
possible factors influencing people’s learning In addition to the above conjectures and
behavior, we have to consult science disciplines theories, more concrete evidence is observed
which have paid much attention to this issue. in economic laboratories. Devetag & Warglien
Fortunately, this question has been investigated (2003) found a significant and positive correlation
by psychologists and decision scientists for a between subjects’ short-term memory scores and
long time. conformity to standard game-theoretic prescrip-
To look into possible factors of learning, we tions in the games. Benjamin, Brown, & Shapiro
have to realize that the reason why people have (2006) imposed a cognitive load manipulation by
to learn lies in their bounded rationality. Because asking subjects to remember a seven-digit number
99
while performing the task. The results showed population sizes. The bigger the population a
that the cognitive load manipulation caused a GP trader has, the more capable it is of handling
statistically significant increase in one of two various concepts and structures to form its trad-
measures of small-stakes risk aversion. Devetag ing strategies.
& Warglien (2008) pointed out that subjects
construct representations of games of different
relational complexity and will play the games ExPERIMENTAL DESIGN
according to these representations. Their experi-
mental results showed that both the differences In this chapter, we will report two kinds of experi-
in the ability to correctly represent the games and mental results. One is from agent-based simula-
the heterogeneity of the depth of iterated thinking tions, and the other is from human experiments.
in games appear to be correlated with short-term The idea of agent-based simulation in this
memory capacity. chapter is to understand human dynamics with
Consequently, we choose to have working the tool of GP agents. The purpose of such
memory capacity as the representative of cogni- simulations is two-fold. First, we let GP agents
tive capacity in this study. In order to obtain this compete with other software trading strategies to
inherent variable, we will give several tests to see the potential of learning agents, and observe
our subjects to measure their working memory the conditions when learning agents can defeat
capacity, apart from manipulating it by imposing other designed strategies. Second, we can test
cognitive loading tasks. the influences of cognitive capacity by imposing
different population sizes on our GP learning
Cognitive Ability in Agent- agents. Such manipulation of cognitive capacity
based Models is almost impossible with human subjects, and
thus it will be very informative if we can have
We have seen from the last section that cognitive simulated results before we eventually perform
capacity, or more specifically, working memory, human experiments.
could be an important factor that has an influence A human experiment is conducted after the
on learning capability. Nevertheless, it is rarely simulations. The results of human experiments
mentioned in agent-based economic models. To will be compared with simulation results to verify
the best of our knowledge, the only exception is whether we have found an adequate way to model
Casari (2004)’s model about adaptive learning human learning processes. Parameters and the
agents with limited working memory. Casari design of both experiments will be presented in
(2004) used Genetic Algorithms (GA) as agents’ this section.
learning algorithms, and made use of the size of
each agent’s strategy set. The results show that Market Mechanism
the model replicates most of the patterns found in
common property resource experiments. Experiments in this chapter were conducted on
Being inspired by different psychological a AIE-DA (Artificial Intelligence in Economics-
studies, we adopt a similar mechanism to model Double Auction) platform which is an agent-based
learning agents’ cognitive capacity in this research. discrete double auction simulator with built-in
We employ GP as traders’ learning algorithms so software agents.
that they can construct new strategies or modify AIE-DA is inspired by the Santa Fe double
old ones based on past experiences. The limits of auction tournament held in 1990, and in this study
working memory are concretized as GP traders’ we adopted the same token generation process as
100
in Rust, Miller, & Palmer (1993, 1994)’s design. and let others negotiate, and then steal
Our experimental markets consist of four buy- the deal when the bids and asks got close
ers and four sellers. Each of the traders can be enough. In our simulations, we modified
assigned a specific strategy–either a designed these three strategies so that they became
trading strategy or a GP agent. more conservative in their bids and offers:
During the transactions, traders’ identities are when they are going to send their orders
fixed so they cannot switch between buyers and to the market, they will choose a number
sellers. Each trader has four units of commodities based on their next token values instead of
to buy or to sell, and can submit only once for current ones, which means their bids and
one unit of commodity at each step in a trading offers are less competitive but are more
day. Every simulation lasts 7,000 trading days, profitable if they succeed in trading.
and each trading day consists of 25 trading steps. • ZIC (Zero-Intelligence Constrained) from
AIE-DA is a discrete double auction market and Gode & Sunder (1993): ZIC traders send
adopts AURORA trading rules such that at most random bids or asks to the market in a
one pair of traders is allowed to make a transac- range bounded by their reservation prices;
tion at each trading step. The transaction price is hence they can avoid transactions which
set to be the average of the winning buyer’s bid incur losses.
and the winning seller’s ask. • ZIP (Zero-Intelligence Plus) from Cliff &
At the beginning of each simulation, each trader Bruten (1997): A ZIP trader forms bids or
will be randomly assigned a trading strategy or as asks by a chosen profit margin, and tries
a GP agent. Traders’ tokens (reservation prices) to choose a reasonable profit margin by
are also randomly generated with random seed inspecting its status, the latest shout price,
6453. Therefore, each simulation starts with a and whether the shouted prices are accept-
new combination of traders and a new demand ed or not.
and supply schedule. • Markup from Zhan & Friedman (2007):
Markup traders setup certain markup rates
Software Strategies and consequently determine their shouted
prices. In this chapter, the markup rate was
In order to test the ability of GP agents, we pro- set to be 0.1. We choose 0.1 because Zhan
grammed several trading strategies from the dou- and Friedman’s simulations show that the
ble auction literature as GP agents’ competitors: market efficiency will be maximized when
traders all have 0.1 markup rates.
• Truth Teller: Truth-telling traders who • Gjerstad-Dickhaut (GD) from Gjerstad &
simply use their reservation prices as their Dickhaut (1998): A GD trader scrutinizes
bids or asks. the market history and calculates the possi-
• Kaplan, Ringuette, and Skeleton from bility of successfully making a transaction
Rust, Miller, & Palmer (1993, 1994)’s with a specific shouted price by counting
tournament: Skeleton is the strategy sup- frequencies of past events. After that, the
plied to all entrants in their competition, trader simple chooses a price as her bid/ask
and it makes safe bids/asks according to if it maximizes her expected profits.
current bids or asks in the market. Kaplan • BGAN (Bayesian Game Against Nature)
and Ringuette were the best and second- from Friedman (1991): BGAN traders treat
best traders, respectively, their trading phi- the double auction environment as a game
losophy being to wait in the background against nature. They form beliefs in other
101
traders’ bids or asks distribution and then as traders’ own reservation prices, current market
compute the expected profit based on their shouts, and average price in the last period, etc.
own reservation prices. Hence their bids/ We adopt the same design of genetic operations
asks simply equal their reservation prices as well as terminal and function sets of GP trad-
minus/plus the expected profit. Finally, ers as Chen, Chie, & Tai (2001) describes, apart
BGAN traders employ Bayesian updating from a different fitness calculation. The fitness
procedures to update their prior beliefs. value of GP traders is defined as the individual
• Easley-Ledyard (EL) from Easley & efficiency achieved, which will be explained later
Ledyard (1993): EL traders balance the in this chapter.
profit and the probability of successfully We did not train our GP traders before they
making transactions by placing aggres- were sent to the double auction tournament. At
sive bids or asks in the beginning, and the beginning of every trading day, each GP trader
then gradually decrease their profit margin randomly picks a strategy from his/her population
when they observe that they might lose of strategies and uses it throughout the whole
chances based on other traders’ bidding day. The performance of each selected strategy
and asking behavior. is recorded, and if a specific strategy is selected
• Empirical strategy is inspired by Chan, more than once, a weighted average will be taken
LeBaron, Lo, & Poggio, and it works in the to emphasize later experiences.
same way as Friedman’s BGAN but devel- GP traders’ strategies are updated–with selec-
ops its belief by constructing histograms tion, crossover, and mutation–every N days, where
from opponents’ past shouted prices. N is called the “select number.” To avoid the flaw
that a strategy is deserted simply because it was not
Named by or after their original designers, selected, we set N as twice the size of the popu-
these strategies were modified to accommodate lation so that theoretically each strategy has the
our discrete double auction mechanism in vari- chance to be selected twice. Tournament selection
ous ways. They were modified according to their is implemented and the size of the tournament is
original design concepts as much as possible. As 5, however big the size of the population is. We
a result, they might not be 100% the same as they also preserve the elite for the next generation,
originally were. and the size of the elite is 1.1 The mutation rate
Although most of the strategies were cre- is 5%, in which 90% of this operation consists of
ated for the purpose of studying price formation a tree mutation.2
processes, we still sent them to the “battlefield”
because they can represent, to a certain degree, Experimental Procedures
various types of trading strategies which can be
observed in financial market studies. Since we have only eight traders (four buyers
and four sellers) in the market while there are
GP Trading Agents twelve trading strategies to be tested, we have to
compare these strategies by randomly sampling
GP agents in this study adopt only standard (without replacement) eight strategies and inject
crossover and mutation operations, by which it is them into the market one at a time. However,
meant that no election, ADFs (Automatic Defined considering the vast amount of combinations and
Functions), nor other mechanisms are imple- permutations of strategies, we did not try out all
mented. We provide GP traders with simple but the possibilities. Instead, 300 random match-ups
basic market information as their terminals, such were created for each series of experiment. In
102
Figure 1. The Demand and the Supply Curves of

each of these match-ups, any selected strategy
Human Experiments. From the top to the bottom
will face strategies completely different from its
these are M1, M2, and M3, respectively
own kind. That is, a certain type of strategy such
as ZIC will never meet another ZIC trader in the
same simulation. Thus, there is at most one GP
trader in each simulated market, and this GP trader
adjusts its bidding/asking behavior by learning
from other kinds of strategies. There is no co-
evolution among GP traders in our experiments.
In order to examine the validity of using popu-
lation sizes as GP traders’ intelligence, a series
of experiments were conducted, and GP traders’
population sizes were set at 5, 20, 30, 40, 50, 60,
70, 80, 90, and 100 respectively. As a result, we
carry out 10 multi-agent experiments for different-
size GP traders. In each experiment, there are 300
simulations due to random match-ups of strategies.
On each simulation, the same market demand and
supply is chosen and kept constant throughout
7,000 trading days. In each trading day, buyers’
and sellers’ tokens are replenished so that they can
start over for another 25 trading steps.
human Subject Experiments
The final stage of this research is to compare

learning agents’ behavior with human traders’
learning dynamics. Therefore, we have to carry
out corresponding experiments on agent simula-
tions and human subject experiments.
In a way that is different from the 10 agent-
based experiments described above, we manually
choose 3 different market structures to test both
our GP agents and human traders. The demand
and supply schedules of these markets are shown
in Figure 1. These markets are chosen because of
their unique properties—market 1 (M1) is a sym-
metric market where buyers and sellers share the symmetric market similar to M1, but it is more
market surplus, and half of each trader’s tokens profitable and there may be more space for stra-
are intramarginal; market 2 (M2) is asymmetric tegic behavior.
and competitive in that each trader has only one In order to test the GP agents and human agents,
intramarginal token, but buyers can compete to we propose an environment where the opponents
grasp the chance to exchange with the lowest of GP traders and human traders are so simple
four units of commodities; market 3 (M3) is a
103
Figure 2. Individual Surplus

that we can exclude many possible factors from
interfering with the comparisons. To achieve this,
we simply have truth tellers as GP traders’ and
human traders’ opponents. As a result, each GP/
human trader will be facing seven truth tellers in
his/her own market experiment.
Another important factor to test in human
experiments is the effect of cognitive capacity
on learning. We adopt working memory as the
measure of cognitive capacity, and assess sub-
jects’ working memory capacity with a series of
computerized psychological tests (Lewandowsky, Considering the inequality in each agent’s
Oberauer, Yang, & Ecker, 2009). The working endowment due to random matching of strate-
memory tests employed here are: gies as well as random reservation prices, direct
comparisons of raw profits might be biased since
• SS: Sentence Span test luck may play a very important role. To overcome
• MU: Memory Updating test this problem, a general index which can evaluate
• OS: Operation Span test traders’ relative performances in all circumstances
• SSTM: Spatial Short-term Memory test is necessary. The idea of individual efficiency
• BDG: Backward Digit Span test meets this requirement, and it can be illustrated
by Figure 2. The individual surplus, which is the
These tests are carried out after human subjects’ sum of the differences between one’s intramarginal
double auction experiments so as to avoid any reservation prices and the market equilibrium
suggestive influence on their behavior in double price, measures the potential profit endowed by
auction markets. the specific position of a trader in the market.
Human subjects are randomly assigned to one Individual efficiency is calculated as the ratio of
of the market traders, and remain in that position one’s actual profits to one’s individual surplus,
throughout the three-market experiment so that and thus measures the ability of a trader to explore
they can learn from repeated experimentation. its potential interests endowed in various circum-
stances. As demonstrated in Equation 1, individual
efficiency is a ratio and can be easily compared
ANALYSIS OF SIMULATIONS across simulations without other manipulations
AND hUMAN ExPERIMENTS such as normalization.
In addition to profits, a strategy’s profit stabil-
In this section, we will report and analyze the re- ity is also taken into account because in double
sults of multi-agent simulations as well as human auction markets, the variation in profits might be
experiments. Before proceeding with the actual considered in human trading strategies, which are
analysis, we are going to acquaint the readers with determined by the human’s risk attitudes. Here
the evaluation methods we used in this chapter. we procure variations in strategies by calculating
In this research, we evaluate the traders’ per- the standard deviation of each strategy’s indi-
formances with a profit-variation point of view. vidual efficiencies.
Profitability is measured in terms of individual
efficiencies.
104
Learning Capabilities of GP Agents • Figure 3 also shows the results from a prof-
it-variation viewpoint. Other things being
In investigating the GP traders’ learning capabil- equal, a strategy with higher profit and less
ity, we simply compare GP agents with designed variation is preferred. Therefore, one can
strategies collected from the literature. We are draw a frontier connecting the most ef-
interested in the following questions: ficient trading strategies. Figure 3 shows
that GP traders, although with more varia-
1. Can GP traders defeat other strategies? tion in profits in the end, always occupy
2. How many resources are required for GP the ends of the frontier.3
traders to defeat other strategies?
The result of this experiment shows that learn-
GP traders with population sizes of 5, 20, and ing GP traders can outperform other (adaptive)
50 are sampled to answer these questions. Figure strategies, even if those strategies may have a
3 is the result of this experiment. Here we repre- more sophisticated design.
sent GP traders of population sizes 5, 20, and 50
with P5, P20, and P50, respectively. We have the Cognitive Capacity and
following observations from Figure 3: Learning Speed
• No matter how big the population is, GP Psychologists tell us that the intelligence of human
traders can gradually improve and defeat beings involves the ability to “learn quickly and
other strategies. learn from experiences” (Gottfredson, 1997). To
• GP traders can still improve themselves investigate the influence of individual intelligence
even under the extreme condition of a on learning speed, we think of a GP trader’s popu-
population of only 5. The fact that the lation size as a proxy for his/her cognitive capac-
tournament size is also 5 means that strate- ity. Is this parameter able to generate behavioral
gies in the population might converge very outcomes consistent with what psychological
quickly. Figure 4 shows the evolution of research tells us?
the average complexity of GP strategies. Figure 5 delineates GP traders’ learning dy-
In the case of P5, the average complexity namics with a more complete sampling. Roughly
almost equals 1 at the end of the experi- speaking, we can see that the bigger the popula-
ments, meaning that GP traders could still tion size, the less time that GP traders need to
gain superior advantages by constantly perform well. In other words, GP traders with
updating their strategy pools composed higher cognitive capacity tend to learn faster and
of very simple heuristics. In contrast with consequently gain more wealth.
P5, in the case of bigger populations, GP However, if we are careful enough, we may
develops more complex strategies as time also notice that this trend is not as monotonic as
goes by. we might think. It seems that there are three groups
• What is worth noticing is that GP might of learning dynamics in this figure. From P5 to
need a period of time to evolve. The bigger P30, there exists a clearly positive relationship
the population, the fewer the generations between “cognitive capacity” and performance.
that are needed to defeat other strategies. P40 and P50 form the second group: they are not
In any case, it takes hundreds to more than very distinguishable, but both of them are better
a thousand days to achieve good perfor- than traders with lower “cognitive capacity”. The
mances for GP traders. most unexplainable part is P60 to P100. Although
105
Figure 3. Comparison of GP Traders with Designed Strategies. From the top to the bottom rows are
comparisons when GP traders’ population sizes are 5, 20, and 50, respectively. (a) The left panels of
each row are the time series of individual efficiencies. (b) The right panels of each row are the profit-
variation evaluation on the final trading day. The horizontal axis stands for their profitability (individual
efficiency, in percentage terms), and the vertical axis stands for the standard deviation of their profits.
106
Figure 4. The Average Complexities of GP Strategies. The GP traders’ population sizes are 5, 20, and 50,
respectively (from the left panel to the right panel). The complexity is measured in terms of the number
of terminal nodes and function nodes of GP traders’ strategy parse trees.
Figure 5. GP Traders’ Performances at Different Levels of Cognitive Capacity. The horizontal axis
denotes generations; the vertical axis consists of the individual efficiencies obtained by GP traders.
Table 1. Wilcoxon Rank Sum Tests for GP Traders’ Performances on Individual Efficiencies
P5 P20 P30 P40 P50 P60 P70 P80 P90 P100

P5 X
P20 0.099* X
P30 0.010** 0.328 X
P40 0.002** 0.103 0.488 X
P50 0.000** 0.009** 0.129 0.506 X
P60 0.000** 0.000** 0.003** 0.034** 0.130 X
P70 0.000** 0.000** 0.015** 0.121 0.355 0.536 X
P80 0.000** 0.000** 0.003** 0.036** 0.131 1.000 0.558 X
P90 0.000** 0.000** 0.011** 0.079* 0.250 0.723 0.778 0.663 X
P100 0.000** 0.000** 0.000** 0.002** 0.009** 0.284 0.093* 0.326 0.150 X
“ * ” denotes significant results under the 10% significance level; “ ** ” denotes significant results under the 5% significance level.
107
this group apparently outperforms traders with It is shown in Table 2 that GP traders with
lower “cognitive capacity,” the inner-group rela- different cognitive capacity do not have significant
tionship between “cognitive capacity” and per- differences in their performances in market 1,
formance is quite obscure. while the differences in their cognitive capacity
For a better understanding of this phenomenon, do bring about significant discrepancies in final
a series of nonparametric statistical tests were performances in market 3—the bigger the popu-
performed upon these simulation results. The lation size, the better the results they can achieve.
outcomes of these tests are presented in Table 1. Market 2 is somewhere in between market 1 and
Pairwise Wilcoxon Rank Sum Tests show that market 3.
when the “cognitive capacity” levels are low, After a quick overview of GP traders’ perfor-
small differences in cognitive capacity may result mance in these three markets, we now turn our
in significant differences in final performances. attention to the results of human experiments.
On the contrary, among those who have high cog- Unlike GP traders, it is impossible to know human
nitive capacity, differences in cognitive capacity traders’ true cognitive capacity. Fortunately, we
do not seem to cause any significant discrepancy can have access to them via various tests which
in performances. Therefore, there seems to be have been validated by psychologists. In our hu-
a decreasing marginal contribution in terms of man experiments, we have twelve human subjects
performance. recruited from among graduate and undergraduate
This phenomenon can be an analogy of what students. We measure their cognitive capacity
the psychological literature has pointed out: high with five working memory tests (see Table 3). In
intelligence does not always contribute to high Table 3, we normalize subjects’ working memory
performance–the significance of intelligent per- scores so that a negative number means their
formance is more salient when the problems are working memory capacity is below the average
more complex. As to the decreasing marginal of the twelve subjects.
value of intelligence, please see Detterman & Each subject was facing seven truth-telling
Daniel (1989) and Hunt (1995). opponents in their own auction markets, and three
markets (M1, M2, and M3, see Figure 1) were
human Subject Experiments experienced in order by each trader. The dynam-
ics of the human traders’ performance in terms of
As mentioned in the section on experimental de- individual efficiency is plotted in Figure 7.
sign, we conduct multi-agent simulations with GP We have several observations from Figure 7:
traders for the three markets specified in Figure 1.
In order to make it easier to observe and compare, 1. Human traders have quite diverse learning
we choose GP traders with population sizes based patterns in market 1 and market 2, but the
on a log scale: 5, 25, and 125. Figure 6 depicts the patterns appear to be more similar. This may
evolution of GP traders’ performance over time. be due to the idiosyncrasies of the markets,
As Figure 6 shows, GP traders learn very or it may be due to the learning effect of hu-
quickly, but they attain different levels of indi- man traders so that they have come up with
vidual efficiencies in different markets. Does GP more efficient strategies market by market.
traders’ cognitive capacity (population size) play 2. Although there are some exceptions, human
any decision role in their performances? To have traders who have above-average working
a more precise description, detailed test statistics memory capacity seem to have better per-
are computed and the output is presented in Table formance than those with below-average
2. working memory capacity. We can see from
108
Figure 6. GP Traders’ Performances over Time in Market 1, Market 2, and Market 3. The horizontal
axis denotes generations; the vertical axis consists of the individual efficiencies (in percentage terms)
obtained by GP traders.
109
Table 2. Wilcoxon Rank Sum Tests for GP Traders’ Performances in M1, M2, and M3
P5 P25 P125
P5 X
0.2168
P25 0.3690 X
0.004758**
0.1416 0.3660
P125 0.003733** 0.1467 X
0.00000007873** 0.0004625**
The numbers in each cell are the p-values for the null hypothesis of no influence resulting from the difference in population sizes in M1, M2,
and M3 respectively (from the top to the bottom). “ * ” denotes significant results under the 10% significance level; “ ** ” denotes significant
results under the 5% significance level.
Table 3. Working Memory Capacity of Human Subjects
Subject SS OS MU SSTM BDG WMC Rank

B -0.36 -2.48 0.29 0.27 -1.13 -0.68 10
D -1.79 -0.12 0.67 -0.28 1.06 -0.09 8
F 0.32 -0.59 -0.94 -1.15 -2.03 -0.88 11
G 0.65 -0.24 0.66 -0.05 0.68 0.34 4
H -2.15 0.63 -1.42 -1.70 -0.48 -1.02 12
I 0.83 1.53 1.29 0.74 0.93 1.06 1
K 0.57 0.76 -0.17 0.35 0.55 0.41 3
L 0.06 0.35 -0.28 0.74 0.29 0.23 5
O 0.22 0.54 1.21 1.68 0.93 0.92 2
R 0.47 -0.35 0.90 0.19 -1.00 0.04 7
S 0.08 -0.55 -1.68 0.66 -0.48 -0.39 9
T 1.10 0.52 -0.53 -1.46 0.68 0.06 6
Tests scores of each test item are normalized, and the final scores (working memory capacity, WMC) are obtained by averaging these five scores.
the figure that the solid series tend to lie in Thus there seems to be a big difference between
a higher position than the dashed series. their learning speeds. If we are going to have GP
3. On average, it takes human traders less traders compete with human traders in the same
than six trading periods to surpass 90%. GP market, we can obviously observe the difference
traders’ learning speed is about the same: it in their learning speeds.
takes GP traders less than ten generations to Although there is a difference in the GP traders’
achieve similar levels. However, we have to and human traders’ learning speeds, suggesting
notice that the GP traders’ generation consists that human traders may have different methods
of several trading periods—10 periods for or techniques of updating and exploring their
P5, 50 periods for P50, and 250 periods for strategies, it is still possible to modify GP trad-
P125. ers to catch up with human traders. However,
what is important in this research is to delve into
110
Figure 7. Human Traders’ Performances over Time in the Three Markets. Solid series are the learning
dynamics of human traders whose working memory capacity is above average; dashed series are the
learning dynamics of traders whose memory capacity is below average (the left panel). The right panels
are the average performances of above- and below-average traders. The horizontal axis denotes genera-
tions; the vertical axis denotes the individual efficiencies (in percentage terms) obtained by the traders.
the relationship between cognitive capacity and Because we cannot precisely categorize human
learning behavior. Do our GP traders exhibit cor- traders according to their normalized scores, we
responding patterns as human traders? choose to run linear regressions to see how working
Since we have seen that GP traders’ cognitive memory capacity contributes to human traders’
capacity does not play a significant role in mar- performances. The results are rejected and the
ket 1 and market 2, but that it has a positive and explanatory power of the regression model is very
significant influence on performance in market poor. However, we can go back to the raw data
3, will it be the same in our human experiment? and see what might be neglected in our analysis.
111
Table 4. Results of Linear Regression of Working Memory Capacity on Human Traders’ Performances
Estimate Standard Error t statistic p-value Multiple R-squared Adjusted

R-squared †
M1 intercept 91.444** 9.501 9.625 2.75e-05 0.3326
wmc 31.673 16.960 1.868 0.104 0.2372 †
M2 intercept 83.44** 12.19 6.847 0.000243 0.0724
wmc 16.08 21.76 0.739 0.483859 -0.06012 †
M3 intercept 96.556** 3.313 29.144 1.44e-08 0.3586
wmc 11.701* 5.914 1.978 0.0884 0.267 †
“*” denotes significant results under the 10% significance level; “**” denotes significant results under the 5% significance level.
When we try to emphasize the potential influ- results from the GP simulations tell us that the
ence of the cognitive capacity (here we mean the influences of cognitive capacity are significant in
working memory capacity) on human traders’ market 3, while they are insignificant in market
performances, we are suggesting that cognitive 1, and only significant in market 2 when the dif-
capacity may play a key role in the process- ference in cognitive capacity is 25 times large. In
ing of information as well as the combination brief, we have very similar patterns for the GP
and construction of strategies. The assumption trader simulation and human subject experiments.
here is that people have to get acquainted with Does this prove anything related to our research
the problems and form their strategies from the goals? We have to realize that there is a limita-
beginning. However, this may be far from true tion on our analysis so far, but the limits to our
because people come to the lab with different research may not necessarily work against our
background knowledge and different experiences, analytical results, but may suggest that we need
and experimentalists can control this by excluding more experiments and more evidence to clarify the
experienced subjects from participating in similar entwined effects occurring during human traders’
experiments. How can experienced subjects be decision-making processes. We name several of
excluded even if they did not participate in similar them as follows:
experiments before?
In this study, we can approach this problem 1. The number of human subjects greatly limits
by excluding subjects who have participated in the validity of our research. As a result, we
markets which use double auctions as their trad- have to conduct more human experiments to
ing mechanisms. From a survey after the experi- gain stronger support for our analytic results.
ments, we can identify three subjects who have 2. Does the significance of working memory
experience in stock markets or futures markets.4 capacity appear because of its influences
Following this logic, we re-examine the relation- in decision making, or it is because of a
ship between the working memory capacity and learning effect taking place when human
human traders’ performance, and the results are subjects start from market 1 to market 3?
shown in Table 4. We cannot identify the effects of various
As Table 4 shows, the working memory capac- possible channels.
ity only has a significant influence on traders’ 3. What does the pattern observed in GP
performances in market 3. We can compare this simulations and human experiments mean,
with the GP traders’ results shown in Table 2. The even if working memory capacity can really
112
influence economic decision making? Why CONCLUDING REMARKS

does it not matter to have a larger working
memory capacity in market 1 and market 2, The significance of this chapter resides in two
and for it to become important in market 3? facets. First, it raises the issue of heterogeneity
in individual cognitive capacity since most agent-
Regarding the first point, it is quite striking based economic or financial models do not deal
that we can greatly increase the significance of with it. Second, the research strategy we adopt in
our results simply by excluding three subjects. this chapter reveals the value of multi-directional
By recruiting more inexperienced subjects into and reciprocal relationships among agent-based
our experiments, we may have more confidence computational modeling, experimental econom-
in interpreting the results, either with positive ics, and psychology.
proof or negative rejection. In this chapter, we propose a method to model
As to the second point, there is no way but individual intelligence in agent-based double
to collaborate more closely with psychologists. auction markets. We then run a series of experi-
The effects and models of working memory have ments to validate our results according to what
been studied by psychologists for such a long time psychological studies have shown us.
that economists can get instant and clear help Simulation results show that it is viable to
from their expertise in this. However, it will not use population size as a proxy of the cognitive
be enough just to consult psychologists for their capacity of GP traders. In general, the results are
opinions, but tighter collaboration is a must from consistent with psychological findings—a positive
the design phase to the analytical stage because of relationship between intelligence and learning per-
the dissimilarities in the experimental logic and formance, and a decreasing marginal contribution
target problems, etc. between these two disciplines. of extra intelligence—and with our human experi-
The final point, which is also the most concrete ments—the patterns of the influence of working
finding in the last part of our research, requires memory capacity in agents’ performances. Our
more contemplation. There are economic reasons study therefore shows that, by employing Ge-
for tackling this problem. Given the assumption of netic Programming as the learning algorithm, it
the effectiveness of the working memory capacity is possible to model both the individual learning
on economic performance, such results suggest behavior and the innate heterogeneity of individu-
that different market problems may bring decision als at the same time.
makers different kinds of problems. It might be The results of this study remind us of a possi-
that it takes cognitive capacity to solve some of bility that there is another facet to connect human
the problems, while this is not the case in other intelligence and artificial intelligence. Artificial
problems. Or, it may be that the potential effec- intelligence not only can be used to model intel-
tive strategies in market 3 require more cogni- lectual behavior individually, but it is also able
tive capacity to be accomplished, while in other to capture social heterogeneity through a proper
markets simple heuristics are already sufficiently parameterization.
profitable. If either of these explanations can be The other contribution of this chapter is the
proved to be true, it will be very informative for relationship between human experiments, agent-
economic experimentalists in the future in the based simulations, and psychology. It is already
sense that they can deploy agent simulations known that agent-based simulation is not just a
to understand the problem better before human complementary tool when it is too costly to con-
experiments are launched. duct human experiments, for it can also help us
test and verify economic theories without human
113
experiments. However, even when being ‘com- joint work including the design phase should be
bined’ together, human experiments are always the expected. We anticipate that researchers can ac-
counselors of agent-based models, just as Duffy quire more precise findings by experimentation
(2006) observes: with the help from human subjects and software
agents in a way delivered in this chapter.
“with a few notable exceptions, researchers have
not sought to understand findings from agent-
based simulations with follow-up experiments ACKNOWLEDGMENT
involving human subjects. The reasons for this
pattern are straightforward. … As human subject The authors are grateful to an anonymous referee
experiments impose more constraints on what a for very helpful suggestions. The authors are also
researcher can do than do agent-based modeling thankful to Prof. Lee-Xieng Yang in the Research
simulations, it seems quite natural that agent- Center of Mind, Brain, and Learning of National
based models would be employed to understand Chengchi University for his professional support
laboratory findings and not the other way around.” and test programs of working memory capac-
(Duffy, 2006, p.951) ity. The research supports in the form of NSC
grant no. NSC 95-2415-H-004-002-MY3, and
Human experiments may be greatly con- NSC 96-2420-H-004-016-DR from the National
strained, but at the same time there are so many Science Council, Taiwan are also gratefully ac-
unobservable but intertwining factors function- knowledged.
ing during human subjects’ decision processes.
On the other hand, agent-based models can be
strictly controlled, and software agents are almost REFERENCES
transparent. Thus, it would be an advantage to turn
to agent-based simulations if researchers want Benjamin, D., Brown, S., & Shapiro, J. (2006).
to isolate the influence of a certain factor. In this Who is ‘behavioral’? Cognitive ability and
regard, we think that agent-based simulations can anomalous preferences. Levine’s Working Paper
also be a tool to discover unknown factors even Archive 122247000000001334, UCLA Depart-
before human experiments are conducted. ment of Economics.
In this chapter, we actually follow this strategy Casari, M. (2004). Can genetic algorithms explain
by eliciting ideas from the psychological literature experimental anomalies? An application to com-
first, transplanting it in an economics environment mon property resources. Computational Econom-
in the form of agent-based simulations, and finally ics, 24, 257–275. doi:10.1007/s10614-004-4197-5
conducting corresponding human experiments
after we have gained support from agent-based Chan, N. T., LeBaron, B., Lo, A. W., & Poggio,
simulations. However, what we mean by multi- T. (2008). Agent-based models of financial mar-
directional relationships among agent-based kets: A comparison with experimental markets.
simulation, human experiments, and psychology MIT Artificial Markets Project, Paper No. 124,
has a deeper meaning. We believe that the knowl- September. Retrieved January 1, 2008, from http://
edge from these three fields has a large space for citeseer.ist.psu.edu/chan99agentbased.html.
collaboration, but it should be done not only by
referring to the results of each other as a final
source of reference. These three disciplines each
possesses an experimental nature, and a cyclical
114
Chen, S.-H., Chie, B.-T., & Tai, C.-C. (2001). Devetag, G., & Warglien, M. (2008). Playing the
Evolving bargaining strategies with genetic pro- wrong game: An experimental analysis of rela-
gramming: An overview of AIE-DA Ver. 2, Part tional complexity and strategic misrepresentation.
2. In B. Verma & A. Ohuchi (Eds.), Proceedings Games and Economic Behavior, 62, 364–382.
of Fourth International Conference on Computa- doi:10.1016/j.geb.2007.05.007
tional Intelligence and Multimedia Applications
Duffy, J. (2006). Agent-based models and human
(ICCIMA 2001) (pp. 55–60). IEEE Computer
subject experiments . In Tesfatsion, L., & Judd, K.
Society Press.
(Eds.), Handbook of Computational Economics
Chen, S.-H., Zeng, R.-J., & Yu, T. (2008). Co- (Vol. 2). North Holland.
evolving trading strategies to analyze bounded
Easley, D., & Ledyard, J. (1993). Theories of price
rationality in double auction markets . In Riolo,
formation and exchange in double oral auction .
R., Soule, T., & Worzel, B. (Eds.), Genetic Pro-
In Friedman, D., & Rust, J. (Eds.), The Double
gramming Theory and Practice VI (pp. 195–213).
Auction Market-Institutions, Theories, and Evi-
Springer.
dence. Addison-Wesley.
Cliff, D., & Bruten, J. (1997). Zero is not enough:
Friedman, D. (1991). A simple testable model
On the lower limit of agent intelligence for con-
of double auction markets. Journal of Eco-
tinuous double auction markets (Technical Report
nomic Behavior & Organization, 15, 47–70.
no. HPL-97-141). Hewlett-Packard Laboratories.
doi:10.1016/0167-2681(91)90004-H
Retrieved January 1, 2008, from http://citeseer.
ist.psu.edu/cliff97zero.html Gjerstad, S., & Dickhaut, J. (1998). Price forma-
tion in double auctions. Games and Economic
Das, R., Hanson, J. E., Kephart, J. O., & Tes-
Behavior, 22, 1–29. doi:10.1006/game.1997.0576
auro, G. (2001). Agent-human interactions in
the continuous double auction. In Proceedings Gode, D., & Sunder, S. (1993). Allocative ef-
of the 17th International Joint Conference on ficiency of markets with zero-intelligence trad-
Artificial Intelligence (IJCAI), San Francisco. ers: Market as a partial substitute for individual
CA: Morgan-Kaufmann. rationality. The Journal of Political Economy,
101, 119–137. doi:10.1086/261868
Detterman, D. K., & Daniel, M. H. (1989). Cor-
relations of mental tests with each other and with Gottfredson, L. S. (1997). Mainstream science on
cognitive variables are highest for low-IQ groups. intelligence: An editorial with 52 signatories, his-
Intelligence, 13, 349–359. doi:10.1016/S0160- tory, and bibliography. Intelligence, 24(1), 13–23.
2896(89)80007-8 doi:10.1016/S0160-2896(97)90011-8
Devetag, G., & Warglien, M. (2003). Games and Gregg, L., & Simon, H. (1979). Process models
phone numbers: Do short-term memory bounds and stochastic theories of simple concept forma-
affect strategic behavior? Journal of Economic tion. In H. Simon, Models of Thought (Vol. I).
Psychology, 24, 189–202. doi:10.1016/S0167- New Haven, CT: Yale Uniersity Press.
4870(02)00202-7
115
Grossklags, J., & Schmidt, C. (2006). Software Taniguchi, K., Nakajima, Y., & Hashimoto, F.
agents and market (in)efficiency—a human trader (2004). A report of U-Mart experiments by human
experiment. IEEE Transactions on System, Man, agents . In Shiratori, R., Arai, K., & Kato, F. (Eds.),
and Cybernetics: Part C . Special Issue on Game- Gaming, Simulations, and Society: Research
theoretic Analysis & Simulation of Negotiation Scope and Perspective (pp. 49–57). Springer.
Agents, 36(1), 56–67.
Zhan, W., & Friedman, D. (2007). Markups in
Hunt, E. (1995). The role of intelligence in mod- double auction markets. Journal of Economic Dy-
ern society. American Scientist, (July/August): namics & Control, 31, 2984–3005. doi:10.1016/j.
356–368. jedc.2006.10.004
Kagel, J. (1995). Auction: A survey of experimen-
tal research . In Kagel, J., & Roth, A. (Eds.), The
Handbook of Experimental Economics. Princeton KEY TERMS AND DEFINITIONS
University Press.
Genetic Programming (GP): An automated
Lewandowsky, S., Oberauer, K., Yang, L.-X., & method for creating a working computer program
Ecker, U. (2009). A working memory test battery from a high-level problem statement of a problem.
for Matlab. under prepartion for being submitted Genetic programming starts with a randomly
to the Journal of Behavioral Research Method. created computer programs. This population of
Payne, J., Bettman, J., & Johnson, E. (1993). The programs is progressively evolved over a series
Adaptive Decision Maker. Cambridge University of generations. The evolutionary search uses the
Press. Darwinian principle of natural selection (survival
of the fittest) and analogs of various naturally
Rust, J., Miller, J., & Palmer, R. (1993). Behavior occurring operations, including crossover (sexual
of trading automata in a computerized double auc- recombination), mutation, etc.
tion market . In Friedman, D., & Rust, J. (Eds.), Cognitive Capacity: A general concept used
Double Auction Markets: Theory, Institutions, in psychology to describe human’s cognitive flex-
and Laboratory Evidence. Redwood City, CA: ibility, verbal learning capacity, learning strategies,
Addison Wesley. intellectual ability, etc. Although cognitive capac-
Rust, J., Miller, J., & Palmer, R. (1994). Character- ity is a very general concept and can be measured
izing effective trading strategies: Insights from a from different aspects with different tests, concrete
computerized double auction tournament. Journal concepts such as intelligence quotient (IQ) and
of Economic Dynamics & Control, 18, 61–96. working memory capacity are considered highly
doi:10.1016/0165-1889(94)90069-8 representative of this notion.
Double Auction: A system in which potential
Simon, H. (1981). Studying human intelligence by buyers submit their bids and potential sellers
creating artificial intelligence. American Scientist, submit their ask prices (offers) simultaneously.
69, 300–309. The market is cleared when a certain price P is
Simon, H. (1996). The Sciences of the Artificial. chosen so that all buyers who bid more than P
Cambridge, MA: MIT Press. and all sellers who ask less than P are matched
to make transactions.
Working Memory: The mental resources
used in the decision-making processes of humans
and is highly related to general intelligence. It is
116
generally assumed that working memory has a main factor determining the compositions
constrained capacity, hence this capacity plays of the populations. Therefore, the number
an important role which determine people’s per- of elite is set to be 1.
formance in cognitive tasks, especially complex 2
Generally speaking, the larger the mutation
reasoning ones. rate, the more diverse the genotypes of the
Boundedly Rational Agents: Experience lim- strategies are. In most studies, the mutation
its in formulating and solving complex problems rate ranges from 1% to 10%, therefore it is
and in processing (receiving, storing, retrieving, set to be 5% in this research.
transmitting) information, therefore, they solve 3
One may suspect that GP traders will per-
problems by using certain heuristics instead of form very poorly from time to time since
optimizing. they also have the biggest variances in the
Individual Efficiency: A ratio used to evaluate profits. To evaluate how worse GP traders
agents’ performance in the markets. In economic can be, we keep track of the rankings of
theory, once demand and supply determine the their performances relative to other trading
equilibrium price, agents’ potential profits (indi- strategies. As a result, the average rankings
vidual surplus) can be measured as the differences of GP traders are the smallest among all the
between his/her reservation prices and the equi- designed trading strategies. This means that
librium price. Individual efficiency is calculated although GP traders may use not-so-good
as the ratio of agents’ actual profits over their strategies sometimes, their performances
potential profits. are still barely adequate as compared with
other kinds of designed trading strategies.
4
From the left panel of Figure 7, we can see
ENDNOTES that among the human traders with lower
working memory capacity, there are about
1
Elitism preserves the best strategy in current two traders who constantly perform quite
population to the next. While elitism helps well in every market. In fact, these traders
preserve good strategies when there is no are exactly those subjects with experience
guarantee that every strategy will be sampled in stock markets or futures markets.
in our designed, we don’t want it to be the
117
118
Chapter 7
Evolution of Agents in a
Simple Artificial Market
Hiroshi Sato
Masao Kubo
Akira Namatame
ABSTRACT
In this chapter, we conduct a comparative study of various traders following different trading strategies.
We design an agent-based artificial stock market consisting of two opposing types of traders: “rational
traders” (or “fundamentalists”) and “imitators” (or “chartists”). Rational traders trade by trying to
optimize their short-term income. On the other hand, imitators trade by copying the majority behavior
of rational traders. We obtain the wealth distribution for different fractions of rational traders and
imitators. When rational traders are in the minority, they can come to dominate imitators in terms of
accumulated wealth. On the other hand, when rational traders are in the majority and imitators are in
the minority, imitators can come to dominate rational traders in terms of accumulated wealth. We show
that survival in a finance market is a kind of minority game in behavioral types, rational traders and
imitators. The coexistence of rational traders and imitators in different combinations may explain the
market’s complex behavior as well as the success or failure of various trading strategies. We also show
that successful rational traders are clustered into two groups: In one group traders always buy and their
wealth is accumulated in stocks; in the other group they always sell and their wealth is accumulated in
cash. However, successful imitators buy and sell coherently and their wealth is accumulated only in cash.
INTRODUCTION market. The classic answer, given by Friedman, is

that they cannot. Friedman argued that mistaken
Economists have long asked whether traders who investors buy high and sell low and as a result
misperceive future prices can survive in a stock lose money to rational traders, eventually losing
all their wealth.
DOI: 10.4018/978-1-60566-898-7.ch007
Evolution of Agents in a Simple Artificial Market
On the other hand, Shleifer and his colleagues just the level of expected returns. The question
questioned the presumption that traders who of whether there are winning and losing market
misperceive returns do not survive (De Long, strategies and how to characterize them has been
1991). Since noise traders who are on average discussed from a practical point of view in (Cino-
bullish bear more risk than do investors holding cotti, 2003). On the one hand, it seems obvious
rational expectations, as long as the market re- that different investors exhibit different investing
wards risk-taking, noise traders can earn a higher behaviors that are responsible for the movement
expected return even though they buy high and of market prices. On the other hand, it is difficult
sell low on average. Because Friedman’s argument to reconcile the regular functioning of financial
does not take into account the possibility that markets with the coexistence of heterogeneous
some patterns of noise traders’ misperceptions investors with different trading strategies (Levy,
might lead them to take on more risk, it cannot 2000). If there exists a consistently winning mar-
be correct as stated. ket strategy, then it is reasonable to assume that
It is difficult to reconcile the regular function- the losing trading strategies will disappear in the
ing of financial markets with the coexistence of long run through the force of natural selection.
different populations of investors. If there is a In this chapter we take an agent-based model
consistently winning market strategy, then it is approach for a comparative study of different
reasonable to assume that the losing population strategies. We examine how traders with various
will disappear in the long run. It was Friedman trading strategies affect prices and their success
who first advanced the hypothesis that in the long in the market measured by their accumulation of
run irrational investors cannot survive because wealth. Specifically, we show that imitators may
they tend to lose wealth and disappear. For agents survive and come to dominate rational investors in
prone to forecasting errors, the fact that different wealth when the proportion of imitators is much
populations with different trading strategies can less than that of rational traders.
coexist still requires an explanation. The chapter is organized as follows: In Sec-
Recent economic and finance research reflects tion 2 we survey the related literature. Section 3
growing interest in marrying the two viewpoints, describes the relationship between the Ising model
that is, in incorporating ideas from the social sci- and the Logit model. Sections 4 and 5 describe an
ences to account for the fact that markets reflect artificial stock market as the main ingredient in
the thoughts, emotions, and actions of real people our agent-based financial market. The simulation
as opposed to the idealized economic investors results and discussion are shown in Sections 6 and
who underlie efficient markets (LeBaron, 2000). 7 respectively. Section 8 concludes the chapter.
Assumptions about the frailty of human rational-
ity and the acceptance of such drives as fear and
greed underlie the recipes developed over the RELATED LITERATURE
decades in so-called technical analysis. There is
growing empirical evidence of the existence of One can distinguish two competing hypotheses
herd or crowd behavior. Herd behavior is often by their origins, one derived from the traditional
said to occur when many people take the same Efficient Market Hypothesis (EMH) and a recent
action, because some mimic the actions of others alternative that is sometimes called the Interacting
(Sornette, 2003). Agent Hypothesis (IAH) (Tesfatsion, 2002). The
To adequately analyze whether noise traders EMH states that the price fully and instantaneously
are likely to persist in an asset market, we need to reflects any new information: The market is, there-
describe the long run distribution of wealth, not fore, efficient in aggregating available information
119
with its invisible hand. The agents are assumed their fractions show considerable fluctuation over
to be rational and homogeneous with respect to time. The mean-reversion regime corresponds to
their access and their assessment of information; the situation in which the market is dominated
as a consequence, interactions among them can by fundamentalists who recognize overpricing
be neglected. or underpricing of the asset and who expect the
In recent literature, several papers try to explain stock price to move back towards its fundamental
the stylized facts as the macroscopic outcome of value. The trend-following regime represents a
an ensemble of heterogeneous interacting agents situation when the market is dominated by trend
(Cont, 2000; LeBaron, 2001). In this view, the followers expecting continuation of, for example,
market is populated by agents with different good news in the (near) future and so expect posi-
characteristics, such as differences in access to tive stock returns.
and interpretation of available information, dif- They also allow the coexistence of different
ferent expectations or different trading strategies. types of investors with heterogeneous expectations
The traders interact, for example, by exchang- about future payoffs and evolutionary switching
ing information, or they trade by imitating the between different investment strategies. Disagree-
behavior of others. The market possesses, then, ment in asset pricing models can arise because
an endogenous dynamics, and the strict one-to- of two assumptions: differential information and
one relationship with the news arrival process differential interpretation. In the first case, there is
does not hold any longer (although the market an information asymmetry between one group of
might still be efficient in the sense of a lack of agents that observes a private signal and the rest
predictability). The universality of the statistical of the population that has to learn the fundamental
regularities is seen as an emergent property of this value from public information, such as prices.
internal dynamics, governed by the interactions Asymmetric information causes heterogeneous
among agents. expectations among agents.
Boswijk et al. estimated an asset-pricing model Agents use different “models of the market”
using annual US stock price data from 1871 until to update their subjective valuation based on the
2003 (Boswijk, 2004). The estimation results earnings news, and this might lead them to hold
support the existence of two expectation regimes. different beliefs. However, the heterogeneity of
The first can be characterized as a fundamentalist expectations might play a significant role in as-
regime because agents believe in mean reversion set pricing. A large number of models have been
of stock prices toward the benchmark fundamen- proposed that incorporate this hypothesis. They
tal value. The second can be characterized as a assume that agents adopt a belief based on its past
chartist trend-following regime because agents performance relative to the competing strategies.
expect deviations from the fundamental to trend. If a belief performed relatively well, as measured
The fractions of agents using the fundamentalists by realized profits, it attracts more investors while
forecasting rule and of agents using the trend- the fraction of agents using the “losing” strategies
following forecasting rule show substantial time will decrease. Realized returns thus contribute
variation and switching between predictors. more support to some of the belief strategies
They suggest that behavioral heterogeneity is than others, which leads to time variation in the
significant and that there are two different regimes, sentiment of the market.
a “mean reversion” regime and a “trend follow- The assumption of evolutionary switching
ing” regime. To each regime, there corresponds a among beliefs adds a dynamic aspect that is miss-
different investor type: fundamentalists and trend ing in most of the models with heterogeneous
followers. These two investor types coexist and opinions mentioned above. In our model investors
120
are boundedly rational because they learn from spins relative to one another. The character of
the past performance of the strategies which one the magnetic substance is determined by the in-
is more likely to be successful in the near future. teraction of the spins. In the investor model, the
They do not use the same predictor in every two spin states represent an agent’s investment
period and make mistakes, but switch between attitude. Each agent changes attitude according
beliefs in order to minimize their errors. Agents to the probability of the spin reversing.
may coordinate expectations on trend-following The probability Pi(t + 1) that agent i buys at
behavior and mean reversion, leading to asset price time t + 1 is defined as
fluctuations around a constant fundamental price.
(Alfarano, 2004) also estimated a heteroge- 1
Pi (t + 1) = , (3.1)
neous agent model (HAM) to exchange rates 1 + exp(−2bhi (t ))
with fundamentalists and chartists and found
considerable fluctuation of the market impact
of fundamentalists. All these empirical papers where
suggest that heterogeneity is important in explain-
ing the data, but much more work is needed to hi (t ) = ∑ J ij S j (t ) − aSi (t ) M (t ) (3.2)
j
investigate the robustness of this empirical find-
ing. Our chapter may be seen as one of the first
attempts to estimate a behavioral HAM on stock M (t ) = ∑ S i (t ) / N (3.3)
market data and investigate whether behavioral i
heterogeneity is significant.
In (3.1) hi(t), defined in (3.2), represents the
investment attitude of agent i, the parameter β is
INFERRING UTILITY FUNCTIONS a positive constant, and Jij represents the influ-
OF SUCCESSES AND FAILURES ence level of neighboring agent j. Therefore, the
first term of (3.2) represents the influence of the
In this section, we try to infer the utility functions neighborhood. The investment variables Sj, j =
of traders by relating the so-called Ising model 1, 2, . . ., n take the value -1 when agent j sells
and the Logit model. We clarify the following and +1 when she buys. The second term of (3.2)
fact: success calls success and failure calls failure. represents the average investment attitude, with α
a positive constant. If many agents buy, then the
Ising Model investment attitude decreases. The investment
attitude represents the agent’s conformity with
Bornholdt and his colleagues analyzed profit mar- neighboring agents.
gins and volatility by using the Ising model, which The average investment attitude should rise at
is a phase transition model in physics (Bornholdt, least so that prices may rise more than this time
2001; Kaizoji, 2001). The Ising model is a model step. In other words, it is necessary that the number
of magnetic substances proposed by Ising in 1925 of agents who purchase be greater than this term.
(Palmer, 1994). In the model, there are two modes It is thought that the probability of the investment
of spin: upward (S = +1) and downward (S = -1). attitude changing rises as the absolute value of
Investment attitude in the investor model plays M(t) approaches one. It can be said that the agent
the same role as spin plays in the Ising model. is “applying the brakes” to the action, where “ac-
In the model, magnetic interactions seek to align tion” refers to the opinion of the neighborhood.
121
Logit Model (Stochastic Relating the Two Models

Utility Theory)
By equating (3.1) and (3.6), we can obtain the
The Logit model is based on stochastic utility following relation:
theory applied to individual decision-making
(Durlauf 00). In stochastic utility theory, an agent 1
=
1
.
is assumed to behave rationally by selecting the 1 + exp[−(V1(t + 1) −V2 (t + 1))] 1 + exp[−2bhi (t )]
option that brings a high utility. But the individual’s (3.7)
utility contains some random element. This un-
certain factor is treated as a random variable in If we set Jij = J for all i, j, then we have
stochastic utility theory. The utilities associated
with the choices of S1 (buy) and S2 (sell) are given J
V1 (t + 1) −V2 (t + 1) = 2αβ[ ∑ S (t ) − Si (t ) M (t ) ]
α j j
as follows:
U1 = V1 + ε1 the utility of choosing S1,
U2 = V2 + ε2 the utility of choosing S2,
εi i=1, 2 random variables. J N 1(t ) − N 2 (t )
The probability of agent i buying is given by = 2αβ[ (n1(t ) − n2 (t )) − Si (t ) ]
α N
pi = Pr(U 1 > U 2 ) = Pr(V1 + e1 > V2 + e2 )

2αβ JN
= [ (n (t ) − n2 (t )) − Si (t ) N 1(t ) − N 2 (t ) ]
N α 1
= Pr(V1 −V2 > e2 − e1 ). (3.4)
2αβ
By denoting the joint probability density func- = [λ(n1(t ) − n2 (t )) − Si (t ) N 1(t ) − N 2 (t ) ]
N
tion of the random variables εi, for i = 1, 2, by (3.8)
f(ε1, ε2), we can derive
We also assume that JN/α = l and 2αβ/N = 1,
+∞ V1 −V 2 + e1
and consider the following two cases:
pi = ∫ e1 =−∞ ∫ e2 =−∞
f (e1, e2 )d e2d e1 . (3.5)
(Case 1) N 1 (t ) − N 2 (t ) ≥ 0 : then we have
Assume that random variables εi, for i = 1, 2, V1(t + 1) = ln1(t ) − S i (t )N 1(t )

are independent and that they follow the Gumbel
density function F(x) = exp{-exp(-x)} (Levy,
2000). Then we can obtain the following expres- V2 (t + 1) = ln2 (t ) − Si (t )N 2 (t ) (3.9)
sion by substitution and integration:
(Case 2) N 1 (t ) − N 2 (t ) ≤ 0 : then we have
1
pi = .
1 + exp{−(V1 −V2 )} (3.6) V1(t + 1) = ln1(t ) + S i (t )N 1(t )
V2 (t + 1) = ln2 (t ) + Si (t )N 2 (t ) (3.10)
The probability of agent i buying is given as
a function of the difference between the utility of
buying and the utility of selling.
122
If the sellers (or purchasers) are in the minority, AN AGENT-BASED FINANCIAL

the market is called a seller (purchaser) market. MARKET MODEL
Profit is large for those on the minority side of
the market. The act of selling (purchasing) in a We consider a dynamic asset pricing model con-
seller (purchaser) market is called here “success”, sisting of heterogeneous agents. We assume that
and the opposite action is called “failure”. If the the fundamentals of the stock are constant and it
purchasers are in the majority (N1(t) - N2(t) > 0), is publicly available to all agents, but they have
the market is a seller’s market. On the other hand, different beliefs about the persistence of deviations
if N1(t) - N2(t) < 0, the market is called a buyer of stock prices from the fundamental benchmark.
market since the sellers are in the majority. The most important design question faced in
We can now classify the utility functions of a market building comes in the representation and
success and a failure as follows: structure of the actual trading agents. Agents
can vary from simple budget-constrained zero-
(1) The utility functions of a success are intelligence agents to those used in sophisticated
genetic programming models. This variation in
V1 (t + 1) = ln1 + N 1(t ) design is due to the fact that trading agents must
solve a poorly defined task. Given that there are
many ways to process past data, there must be as
many ways to construct trading agents (Johnson,
V2 (t + 1) = ln2 + N 2 (t ) (3.11)
2003).
The simplest and most direct route is to model
(2) The utility functions of a failure are agents by well-defined dynamic trading rules
modeled more or less from strategies used in the
V1 (t + 1) = ln1 − N 1(t ) real world. This method can lead to very tractable
precise results that give insight into the interactions
between trading rules. Many markets of this type
assume that the trading strategies will continue
V2 (t + 1) = ln2 − N 2 (t ) (3.12) without modification, although the wealth levels
under their control may be diminishing to zero.
In the market, the number of failures is more This leaves some open questions about coevolu-
than the number of successes. Therefore, the tionary dynamics with only a limited amount of
investment attitudes of an agent who succeeds new speciation. A second critique is that agents in
and an agent who fails will become opposite. these markets do not operate with any well-defined
Furthermore, the successful agent will again be objective function. There is some usefulness to
in the minority in the next period, and likewise having well-defined objective functions for the
the failures will again be in the majority in the agents. There may be an important tradeoff prob-
next period. Therefore the successful agents lem which only a simulation model can answer.
will continue to win, and the failing agents will The second most important part of agent-based
continue to fail. markets is the actual mechanism that governs the
trading of assets. Once one leaves the relatively
simple world of equilibrium modeling, it is neces-
sary to think about the actual details of trading.
This can be both a curse and a blessing to market
designers. On the negative side, it opens up an-
123
other poorly understood set of design questions. “neighbors” because she knows that they, as well
However, on the positive side, it may allow one as the rest of the crowd, will have similar ideas
to study the impact of different trading mecha- about trying to outguess each other on when to
nisms, all of which would be inconsequential in enter the market. More generally, ideally she likes
an equilibrium world. to be in the minority when entering the market,
Most agent-based markets have solved this in the majority while holding her position and
problem in one of three ways: by assuming a simple again in the minority when closing her position.
price response to excess demand, by building the
market in such a way that a kind of local equilib-
rium price can be found easily, or by explicitly hYPOThETICAL VALIDATION
modeling the dynamics of trading to look like USING AN AGENT-BASED MODEL
the continuous trading in an actual market. Most
of the earliest agent-based markets used the first In this section we introduce three population-types
method to model price movements. Most markets that have been already described in the literature
of this type poll traders for their current demand, and that represent more realistic trading behaviors.
sum up the market demand and if there is excess The aim is twofold: First, we want to study the
demand, increase the price. If there is an excess behavior of these stylized populations in a realistic
supply, they decrease the price. environment characterized by limited resources
This has been interpreted as evidence that as and a market clearing mechanism. Second, we
a forecaster ages, evaluators develop tighter prior want to address the important issue of whether
beliefs about the forecaster’s ability, and hence or not winning strategies exist. The fractions of
the forecaster has less incentive to herd with the agents using the fundamental and trend-following
group. On the other hand, the incentive for a forecasting rules show substantial time variation
second-mover to discard his private information and switching between predictors.
and instead mimic the market leader increases
with his initial reputation, as he strives to protect Market Mechanism and
his current status and level of pay. In a practical Performance Measures
implementation of a trading strategy, it is not
sufficient to know or guess the overall direction One of the most important parts of agent-based
of the market. There are additional subtleties markets is the actual mechanism that governs the
governing how the trader is going to enter (buy trading of assets. Most agent-based markets as-
or sell in) the market. For instance, a trader will sume a simple price response to excess demand
want to be slightly ahead of the herd to buy at a and the market is built so that finding a local
better price, before the price is pushed up by the equilibrium price is not difficult. If supply ex-
bullish consensus. Symmetrically, she will want ceeds demand, then the price decreases. Agents
to exit the market a bit before the crowd, that is, maintain stock and capital, and stock is bought or
before a trend reversal. In other words, she would sold in exchange for capital. The model generates
like to be somewhat of a contrarian by buying fluctuations in the value of the stock by limiting
when the majority is still selling and by selling transactions to one unit of stock.
when the majority is still buying, slightly before Price model. The basic model assumes that the
a change of opinion of the majority of her “neigh- stock price reflects the level of excess demand,
bors”. This means that she will not always want which is governed by
to follow the herd, at least at finer time scales. At
this level, she cannot rely on the polling of her
124
P (t ) = P (t − 1) + c[N 1(t ) − N 2 (t )], (5.1) body of research has sought to explain the data
with aggregate models in which a representative
where P(t) is the stock price at time t, N1(t) and agent solves this optimization problem. If the goal
N2(t) are the corresponding number of agents buy- is simply to fit the data, it is not unreasonable to
ing and selling respectively, and χ is a constant. attribute to agents the capacity to explicitly formu-
This expression implies that the stock price is a late and solve dynamic programming problems.
function of the excess demand. That is, the price However, there is strong empirical evidence that
rises when there are more agents buying, and it humans do not perform well on problems whose
descends when more agents are selling. solution involves backward induction. For this
Price volatility. reason, these models fail to provide a realistic ac-
count of the phenomenon. The model we describe
v(t ) = [P (t ) − P (t − 1)] / P (t − 1). (5.2) will not invoke a representative agent, but will
posit a heterogeneous population of individu-
als. Some of these will behave “as if” they were
Individual wealth. We introduce the notional fully informed optimizers, while others will not.
wealth Wi(t) of agent i into the model as follows: Social networks and social interactions–clearly
absent from the prevailing literature–will play an
Wi(t) = P(t)*Φi (t) + Ci (t), (5.3)
explicit central role.
Heterogeneity turns up repeatedly as a crucial
where Φi is the amount of assets (stock) held and
factor in many evolving systems and organizations.
Ci is the amount of cash (capital) held by agent i.
But the situation is not always as simple as saying
It is clear from the equation that an exchange of
that heterogeneity is desirable and homogeneity
cash for assets at any price does not in any way
is undesirable. This remains a basic question in
affect the agent’s notional wealth. However, the
many fields: What is the right balance between
important point is that the wealth Wi(t) is only
heterogeneity and homogeneity? When heteroge-
notional and not real in any sense. The only real
neity is significant, we need to be able to show
measure of wealth is Ci(t), the amount of capital
the gains associated with it. However, analysis of
the agent has available to spend. Thus, it is evident
a collection of heterogeneous agents is difficult,
that an agent has to do a “round trip” (buy (sell)
often intractable.
a stock and then sell (buy) it back) to discover
The notion of type facilitates the analysis
whether a real profit has been made.
of heterogeneity. A type is a category of agents
within the larger population who share some char-
Trader Types acteristics. We distinguish types by some aspects
of the agents’ unobservable internal model that
For modeling purposes, we use representative
characterize their observable behaviors. One can
agents (rational agents) who make rational deci-
imagine how such learning models might evolve
sions in the following stylized terms: If they expect
over time towards equilibria. In principle, this
the price to go up then they buy, and if they expect
evolutionary element can be folded into a meta-
the price to go down then they sell immediately.
learning that includes both the short-term learning
But this then leads to the problem, what happens
and long-term evolution.
if every trader behaves in this same way?
Interaction between agents is a key feature of
Here, some endogenous disturbances need
agent-based systems. Traditional market models
to be introduced. Given this disturbances, the
do not deny that agents interact but assume that
individual is modeled to behave differently. One
they only do so through the price system. Yet
125
agents do, in fact, communicate with each other among actual traders, albeit it is probably
and learn from each other. The investor who en- less popular than trend-following strategies.
ters the market forecasts stock prices by various 2. Imitators. Traders may have incorrect ex-
techniques. For example, the investor makes a pectations about price movements. If there
linear forecast of past price data, forecasts based are such misperceptions, imitators who do
on information from news media, and so forth. not affect prices may earn higher payoffs than
The types of typical investors are usually described strategic traders. Each imitator has a unique
based on differences in their forecast methods (or social network with strategic traders. Within
methods of deciding their investment attitude). this individual network, if the majority of
Three typical investor types are as follows: strategic traders buy then she also buys, and
if the majority of strategic traders sell then
• Fundamentalist: investor with a fundamen- she also sells. It is now widely held that
talist investment attitude based on various mimetic responses result in herd behavior
economic indicators. and, crucially, that the properties of herding
• Chartist: investor who uses analysis tech- arise in financial markets.
niques for finding present value from
charting past price movement. Trading Rules of Trader Types
• Noise trader: investor who behaves ac-
cording to a strategy not based on funda- Agents are categorized by their strategy space.
mental analysis. Since the space of all strategies is complex, this
categorization is not trivial. Therefore, we might,
In this chapter, traders are segmented into two for example, constrain the agents to be finite au-
basic types depending on their respective trading tomata with a bounded number of states. Even after
behavior: rational traders and imitators. Rational making this kind of limitation, we might still be
traders are further classified into two types: mo- left with too large a space to reason about, but there
mentum and contrarian traders. are further disciplined approaches to winnowing
down the space. An example of a more commonly
1. Rational traders. If we assume the funda- used approach, is to assume that the opponent is
mental value is constant, their investment a “rational learner” and to place restrictions on
strategy is based on their expectation of the the opponent’s prior about our strategies. In this
trend continuing or reversing. section we describe a trading rule for each type
Momentum trader: These traders are trend of trader discussed in the previous section.
followers who make decisions based on the
trend of past prices. A momentum trader 1. Rational traders (fundamentalists):
speculates that if prices are rising, they will Rational traders observe the trend of the
keep rising, and if prices are falling, they market and trade so that their short-term
will keep falling. payoff will be improved. Therefore if the
Contrarian trader: These traders differ in trend of the market is “buy”, this agent’s
trading behavior. Contrarian traders specu- attitude is “sell”. On the other hand, if the
late that, if the price is rising, it will stop trend of the market is “sell”, this agent’s
rising soon and will decrease, so it is better attitude is “buy”. As has been explained,
to sell near the maximum. Conversely, if the trading according to the minority decision
price is falling, it will stop falling soon and creates wealth for the agent on performing the
will rise. This trading behavior is present necessary trade, whereas trading according to
126
the majority decision loses wealth. However, where εi (-0.5 < εi < 0.5) is the rate of bullishness
if the agent has held the asset for a length of and timidity of the agent and differs depending
time between buying it and selling it back, on the agent.
his wealth will also depend on the rise and Trading rule for rational traders:
fall of the asset price over the holding period.
On the other hand, the amount of stock that If R F (t) > 0.5 then sell
the purchaser (seller) can put in a single deal If R F (t) < 0.5 then buy (5.6)
and buy (sell) is one unit. Therefore, when If εi is large, agent i has a tendency to “buy”, and
the numbers of purchasers and sellers are if it is small, agent i has a tendency to “sell”.
different, there exists an agent who cannot
make her desired transaction: 2. Imitators (chartists): These agents watch
◦ When sellers are in the majority: the behavior of the rational traders. If the
There is an agent who cannot sell majority of rational traders “buy” then the
even if she is selected to sell exists. imitators “buy”, and if the majority of ratio-
Because the price still falls in a buy- nal traders “sell” then the imitators “sell”.
er’s market, it is an agent agents who
sell are maintaining a large amount of We can formulate the imitators’ behavior as
properties. The agents who maintain follows:
the most property are the ones able to
sell. RS(t): The fraction of rational traders buying at
◦ When buyers are in the majority: time t
There is an agent who cannot buy PI(t): The value of RS(t) estimated by imitator j
even if she is selected to buy. Because
the price rises, The agent still able to
PI (t) = R F (t - 1) + µj (5.7)
buy is the one maintaining a large
amount of capital. The agents who
where εj (-0.5 < εj < 0.5) is the rate of bullishness
maintain the most property are the
and timidity of imitator j and in this experiments,
ones able to buy.
εj is normally distributed.
Trading rule for imitators:
The above trading behavior is formulated as
follows. We use the following terminology:
If RF (t ) > 0.5 then buy
N1(t): Number of agents who buy at time t. If RF (t ) < 0.5 then sell (5.8)
N: Number of agents who participate in the market.
R(t ) = N 1(t ) / N : The rate of agents buying at
SIMULATION RESULTS
time t.
We consider an artificial stock market consisting
We also denote the estimated rate of buying
of 2,500 traders in total. In Figure 1, we show
of agent i at time t as
market prices over time for varying fractions of
rational traders and imitators.
R (t) = R(t - 1) + µ (5.5)
F i
127
Figure 1. Market prices over time for varying fractions of rational traders and imitators
Stock Prices Over Time Increasing the fraction of rational traders sta-
bilizes the market. Maximum stability is achieved
Imitators mimic the movement of a small number when the fraction of rational traders in the popula-
of rational traders. If the rational traders start to tion is 70% and that of the imitators 30%. On the
raise the stock price, the imitators also act to raise other hand, increasing the number of the rational
the stock price. If the rational traders start to lower traders further induces more fluctuation, and the
the stock price, the imitators lower the stock price price will cycle up and down if the fraction of
further. Therefore, the actions of a large number rational traders is increased to 80%. Rational
of imitators amplify the price movement caused traders always trade in whatever direction places
by the rational traders, increasing the fluctuation them in a minority position. In this situation, their
in the value of the stock. actions do not induce fluctuations in the market
128
Figure 2. Movement of price over time when the Figure 3. Movement of price over time when
fraction of rational traders increases gradually the fraction of rational traders moves randomly
from 20% to 80% between 20% and 80%
price. However, when rational traders are in the tators, no trader is a winner or a loser and none
majority, their movements give rise to large market accumulates wealth.
price fluctuations. On the other hand, when the rational traders
In Figure 2 we show the price movement when are in the majority, and the imitators are in the
the fraction of rational traders is increased gradu- minority, the average wealth of the imitators in-
ally from 20% to 80%. Figure 3 shows the price creases over time and that of the rational traders
movement when the fraction of rational traders decreases. Therefore, when the imitators are in the
moves randomly between 20% and 80%. minority, they are better off and their successful
accumulation of wealth is due to losses by the
Comparison of Wealth majority, the rational traders.
We also show the average wealth of the rational Evolution of the Population
traders and imitators over time for varying frac-
tions of rational traders and imitators. We then change the composition of the traders
We now conduct a comparative study of ra- using an evolutionary technique. Eventually, poor
tional traders and imitators. Imitators only mimic traders learn from other, wealthy traders. Figure 5
the actions of rational traders. On the other hand, shows the two typical cases of evolution.
rational traders deliberately consider the direction Domination occurs when traders evolve ac-
of movement of the stock price. Our question is cording to total assets because it takes some time
which type is better off in terms of their accumu- to reverse the disparity in total assets between
lated wealth. winners and losers. On the other hand, rational
When rational traders are not in the majority agents and imitators coexist when they evolve
(their fraction is less than 50%), their average according to their gain in assets. An important
wealth increases over time and that of the imita- point is that coexistence is not a normal situation.
tors decreases. Therefore, if the rational traders Various conditions are necessary for both types
are in the minority, they are better off and their to coexist, including an appropriate updating
successful accumulation of wealth is due to scheme.
losses by the majority, the imitators.
In the region where the number of the rational
traders is almost the same as the number of imi-
129
Figure 4. Changes in average wealth over time for different fractions of rational traders and imitators
IMPLICATION OF of stocks, a cash injection creates an inflationary

SIMULATION RESULTS pressure on prices. The other important finding of
this work is that different populations of traders
The computational experiments performed us- characterized by simple but fixed trading strategies
ing the agent-based model show a number of cannot coexist in the long run. One population
important results. First, they demonstrate that the prevails and the other progressively loses weight
average price level and the trends are set by the and disappears. Which population will prevail and
amount of cash present and eventually injected which will lose cannot be decided on the basis of
into the market. In a market with a fixed amount their strategies alone.
130
Figure 5. Time path of the composition of traders’ types (a) Evolution by wealth (sum of cash and stocks),
(b) Evolution by gain in wealth
Trading strategies yield different results under expect the price to go down, then they will sell
different market conditions. In real life, differ- immediately. In order to introduce heterogeneity
ent populations of traders with different trading among strategic agents we also introduce some
strategies do coexist. These strategies are bound- randomness in the behavioral rules. The other
edly rational and thus one cannot really invoke group consists of imitators, who mimic the strate-
rational expectations in any operational sense. gic traders of their social networks. The model we
Though market price processes in the absence of describe does not invoke a representative agent,
arbitrage can always be described as the rational but posits a heterogeneous population of agents.
activity of utility maximizing agents, the behavior Some of these behave as if they are fully informed
of these agents cannot be operationally defined. optimizers, while others do not.
This work shows that the coexistence of different
trading strategies is not a trivial fact but requires
explanation. SUMMARY AND FUTURE WORK
One could randomize strategies, imposing
that traders statistically shift from one strategy to Experimental economics and psychology have
another. It is however difficult to explain why a now produced strong empirical support for the
trader embracing a winning strategy should switch view that framing effects as well as contextual
to a losing strategy. Perhaps the market changes and other psychological factors put a large gap
continuously and makes trading strategies ran- between homo-sapiens and individuals with
domly more or less successful. More experimental bounded rationality. The question we pose in this
work is necessary to gain an understanding of the chapter is as follows: Does that matter and how
conditions that allow the coexistence of differ- does it matter? To answer these questions, we
ent trading populations. As noted earlier, there developed a model in which imitation in social
are two broad types of agents and we designate networks can ultimately yield high aggregate
them “strategic traders” (“rational agents”) and levels of optimal behavior. It should be noted that
“imitators”. The agents in our model fall into two the fraction of agents who are rational in such an
categories. Members of one group (strategic trad- imitative system will definitely affect the stock
ers) adopt the optimal decision rules. If they expect market. But the eventual (asymptotic) attainment
the price to go up, then they will buy, and if they per se of such a state need not depend on the
131
extent to which rationality is bounded. Perhaps Cincotti, S., Focardi, S., Marchesi, M., & Raberto,
the main issue then is not how much rationality M. (2003). Who wins? Study of long-run trader sur-
there is at the micro level, but how little is enough vival in an artificial stock market. Physica A, 324,
to generate macro-level patterns in which most 227–233. doi:10.1016/S0378-4371(02)01902-7
agents are behaving “as if” they were rational, and
Cont, R., & Bouchaud, J.-P. (2000). Herd behavior
how various social networks affect the dynamics
and aggregate fluctuations in financial markets.
of such patterns.
Macro-economics Dynamics, 4, 170–196.
We conclude by describing our plan for further
research. An evolutionary selection mechanism De Long, J. B., Shleifer, A. L., Summers, H., &
based on relative past profits will govern the Waldmann, R. J. (1991). The survival of noise
dynamics of the fractions and the switching of traders in financial markets. The Journal of Busi-
agents between different beliefs or forecasting ness, 64(1), 1–19. doi:10.1086/296523
strategies. A strategy attracts more agents if it
Durlauf, S. N., & Young, H. P. (2001). Social
performed relatively well in the recent past com-
Dynamics. Brookings Institution Press.
pared to other strategies. There are two related
theoretical issues. One is the connection between Johnson, N., Jeffries, P., & Hui, P. M. (2003).
individual rationality and aggregate efficiency, Financial Market Complexity. Oxford.
that is, between optimization by individuals and
Kaizoji. T, Bornholdt, S. & Fujiwara.Y. (2002).
optimality in the aggregate. The second is the
Dynamics of price and trading volume in a spin
role of social interactions and social networks in
model of stock markets with heterogeneous agent.
individual decision-making and in determining
Physica A.
macroscopic outcomes and dynamics. Regarding
the first, much of mathematical social science Le Baron, B. (2001). A builder’s guide to agent-
assumes that aggregate efficiency requires indi- based financial markets. Quantitative Finance,
vidual optimization. Perhaps this is why bounded 1(2), 254–261. doi:10.1088/1469-7688/1/2/307
rationality is disturbing to most economists: They
LeBaron, B. (2000). Agent based computational
implicitly believe that if the individual is not suf-
finance: suggested readings and early research.
ficiently rational it must follow that decentralized
Journal of Economic Dynamics & Control, 24,
behavior is doomed to produce inefficiency.
679–702. doi:10.1016/S0165-1889(99)00022-6
Levy, M. Levy, H., & Solomon, S. (2000). Mi-
REFERENCES croscopic Simulation of Financial Markets: From
Investor Behavior to Market Phenomena. San
Alfarano, S., Wagner, F., & Lux,T. (2004). Es- Diego: Academic Press.
timation of Agent-Based Models: the case of an
asymmetric herding model. Lux, T., & Marchesi, M. (1999). Scaling and
criticality in a stochastic multi-agent model
Bornholdt, S. (2001). Expectation bubbles in a of a financial market. Nature, 397, 498–500.
spin model of markets. International Journal of doi:10.1038/17290
Modern Physics C, 12(5), 667–674. doi:10.1142/
S0129183101001845
Boswijk H. P., Hommes C. H, & Manzan, S.
(2004). Behavioral Heterogeneity in Stock Prices.
132
Palmer, R. G., Arthur, W. B., Holland, J. H., KEY TERMS AND DEFINITIONS
LeBaron, B., & Tayler, P. (1994). Artificial eco-
nomic life: A simple model of a stock market. Artificial Market: a research approach of the
Physica D. Nonlinear Phenomena, 75, 264–274. market by creating market artificially.
doi:10.1016/0167-2789(94)90287-9 Agent-Based Model: a class of computational
models for simulating the actions and interactions
Raberto, M., Cincotti, S., Focardi, M., & Mar- of autonomous agents
chesi, M. (2001). Agent-based simulation of a Rational Trader: a type of trader whose
financial market. Physica A, 299(1-2), 320–328. decisions of buy, sell, or hold are based on fun-
doi:10.1016/S0378-4371(01)00312-0 damental analysis
Sornette, D. (2003). Why stock markets crash. Noise Trader: a type of trader whose decisions
Princeton University Press. of buy, sell, or hold are not based on fundamental
analysis
Tesfatsion, L. (2002). Agent-based com- Ising Model: a mathematical model of fer-
putational economics: Growing economies romagnetism in statistical mechanics.
from the bottom up. Artificial Life, 8, 55–82. Logit Model: a mathematical model of human
doi:10.1162/106454602753694765 decision in statistics.
133
134
Chapter 8
Agent-Based Modeling Bridges
Theory of Behavioral Finance
and Financial Markets
Hiroshi Takahashi1
Keio University, Japan
Takao Terano2
Tokyo Institute of Technology, Japan
ABSTRACT
This chapter describes advances of agent-based models to financial market analyses based on our recent
research. We have developed several agent-based models to analyze microscopic and macroscopic links
between investor behaviors and price fluctuations in a financial market. The models are characterized by
the methodology that analyzes the relations among micro-level decision making rules of the agents and
macro-level social behaviors via computer simulations. In this chapter, we report the outline of recent
results of our analysis. From the extensive analyses, we have found that (1) investors’ overconfidence
behaviors plays various roles in a financial market, (2) overconfident investors emerge in a bottom-up
fashion in the market, (3) they contribute to the efficient trades in the market, which adequately reflects
fundamental values, (4) the passive investment strategy is valid in a realistic efficient market, however,
it could have bad influences such as instability of market and inadequate asset pricing deviations, and
(5) under certain assumptions, the passive investment strategy and active investment strategy could
coexist in a financial market.
INTRODUCTION assumption of the efficiency of financial markets

plays an important role in the literature in tradi-
Financial Economics researches have become tional financial theory and many research have
active since 1950’s and many prominent theories been conducted based on the assumption (Fried-
regarding asset pricing and corporate finance have man,1953; Fama,1970). For example, CAPM
been proposed (Markowitz, 1952; Modigliani, (Capital Asset Pricing Model), one of the most
Miller, 1958; Sharpe, 1964; Shleifer, 2000). The popular asset pricing theory in the traditional
financial literature, is derived based on the as-
DOI: 10.4018/978-1-60566-898-7.ch008 sumptions of the efficient market and rational
Agent-Based Modeling Bridges Theory of Behavioral Finance and Financial Markets
investors. CAPM indicates that the optimal The next section of this chapter describes
investment strategy is to hold market portfolio the model utilized for this analysis,then analysis
(Sharpe, 1964). results are discussed in sections 3 and 4. Section
However, conventional finance theory meets 5 contains summary and conclusion.
severe critiques about the validities of the as-
sumptions on the markets, or the capabilities to
explain real world phenomena. For example, the DESCRIPTION OF AN AGENT-
worldwide financial crisis in 2008 was said to BASED FINANCIAL MARKET MODEL
be the one, which would occur per ten decades.
Recently, N. N. Taleb describes the role of acci- Basic Framework and Architecture
dental effects in a financial markets and human of Models of a Financial Market
cognitions about the effects (Taleb, 2001). Also,
researchers in behavioral finance have raised some In our research, first, we have observed the macro
doubts about the efficient market assumption, level phenomena of a real financial market, then,
by arguing that an irrational trader could have second, we have modeled the phenomena in an
influences on asset prices (Shiller, 2000; Shleifer, artificial market in a computer. To model the mar-
2000; Kahneman, Tversky, 1979; Kahneman, ket, third, we have introduced micro level decision
Tversky, 1992). making strategies of human investors based on
To address the problems, we employ agent- the recent research on behavioral financial theory
based model (Arthur, 1997; Axelrod, 1997) in and cognitive science (Shleifer, 2000). Forth, we
order to analyze the relation between micro-rules have designed micro-macro level interactions in
and macro-behavior (Axtell, 2000; Russell, 1995). the artificial market, which are not able to be ex-
In the literature, they have frequently reported that amined in the real world. Therefore, our method is
a variety of macro-behavior emerges bottom-up a constructive approach to bridge the state-of-the
from local micro-rules (Epstein, 1996; Levy, art financial theory and real behaviors in a market
2000; Terano, 2001; Terano, 2003; Arthur, 1997; through agent-based models. The framework is
Tesfatsion, 2002). We have developed an artificial summarized in Figure 1.
financial market model with decision making Based on the framework, we have imple-
agents. So far, we have reported on micro-macro mented a common artificial market model de-
links among agents and markets, investors’ be- picted in Figure 2. The market model is character-
haviors with various mental models, and risk ized as follows: (1) benefit and/or loss of a firm
management strategies of the firms (Takahashi, is randomly determined, (2) the information is
2003; Takahashi, 2004; Takahashi, 2006; Taka- observed by investor agents to make their invest-
hashi, 2007; Takahashi, 2010). In this chapter, ment decisions, (3) based on the decisions, agents
based on our recent research, we will describe the trade the financial assets in the artificial market,
basic principles and architecture of our simulator and the market prices are determined, and (4) the
and explain our main findings. The objective of determined prices of the market again give the
the research is to investigate (1) the influences of effects of decision making of the agents. The
micro- and macro-level of investment strategies, detailed descriptions of the model are given below.
(2) roles of the evaluation method, and (3) financial A agent-based simulator of the financial mar-
behaviors, when there are so many investors with ket involving 1,000 investors is used as the
different strategies. model for this research. Several types of investors
exist in the market, each of them undertakes
135
Figure 1. Framework of agent-based financial research
Figure 2. Outline of a common artificial market
transactions based on their own stock calculations. Assets Traded in the Market
They share and risk-free assets with the two pos-
sible transaction methods. The execution of the The market consists of both risk-free and risky
simulator consists of the three major steps: (1) assets. About the risky assets, all profits gained
generation of corporate earnings, (2) formation during each term are distributed to the sharehold-
of investor forecasts, and (3) setting transaction ers. Corporate earnings (yt ) are expressed as
prices. The market conditions will change through
these steps. About the details of parameters of the (y t )
= yt −1 ⋅ (1 + et ) . They are generated accord-
simulator, please refer to the appendix 1. ( )
ing to the process εt ~ N 0, σy 2 . Risky assets
136
are traded just after the public announcement of (b) Trend Predictors
profit for the term. Each investor is given common
asset holdings at the start of the term with no We formulate a model of the investor who
limit placed on debit and credit transactions. finds out the trends from randomly fluctuate stock
prices. This type of investor predicts the stock
Modeling Passive Investors ( )
price Pt +f 1 of the next period by extrapolating
the latest stock trends (10 days). The trend predic-
Passive investors of the simulation model invest
their assets with the same ratio of the market ( )
tors forecast the stock price Pt +f 1 and the profit
benchmarks. This means that (1) each passive (y ) f
t +1
from the trend at period t-1 as
investors keeps one volume stock during the
Pt +f 1 = Pt −1 ⋅ (1 + at −1 ) andytf+1 = yt ⋅ (1 + at −1 ),
2
investment periods, (2) the investment ratio to
the stocks is automatically determined, and (3)
the trade strategy follows buy-and-hold of initial
where ( ( ) )
at −1 = 1 10 ⋅ ∑ i =1 Pt −i Pt −i −1 − 1 .
10
interests. Predicted price (P ) and profit (y ) are differ-

f
t +1
f
t +1
ent when trend measurement period is different.

Modeling Active Investors
(c) Loss over-estimation investors
Active investors make decisions based on expected
utility maximization method described in (Black, We formulate a model in which the investor
Litterman, 1992). Contrary to passive investors, doubles the loss estimates from the reference stock
active investors forecast the stock price. In the price. In the model, the reference stock price is
following section, we will explain the forecasting the one of the 10 periods beforehand. When the
models of active investors. most recent price (Pt−1 ) is lower than the price at
Forecasting Models of Investors ( )

the reference point Ptref , the “Loss over-estima-
tion investors” forecast the stock price Pt +f 1 by ( )
(a) Fundamentalists
converting the original predicted price Pt +1 ( bef f
)
We will refer to the investors who make invest- u s i n g t h e f o r m u l a
ment decisions based on fundamental values as P f bef f
= 2.25 ⋅ Pt +1 − 1.25 ⋅ Pt . As for the
ref
t +1
‘fundamentalists’. We adopt the dividend discount
model, which is the most basic model to determine original predicted price Pt +1 ( bef f
), we use the
the fundamental value of stocks. The fundamen- dividend discount model.
talists are assumed to know that the profit accrues
according to Brownian motion. They forecast the (d) Overconfident Investors
stock price Pt +1
f
and the profit ytf+1 from the
profit of current period (yt ) and the discount rate Bazerman reported that human beings tend to
be overconfident in his/her own ability (Bazer-
of the stock (d ) as Pt +f 1 = yt d and ytf+1 = yt ,
man,1998). In the area of Behavioral Finance,
respectively. Kyle analyzed the influence of the overconfident
investment behaviors on the markets with the ana-
137
lytical method (Kyle,1997). Also in a real market,

we often find that each investor talks about dif-
of the stock held by investors (∑ (F ⋅ w ) P )
M
i =1 t
i i
ti t
ferent future prospects with confidence. It seems are the decreasing function of the stock price, and
that all investors tend to have overconfidence in the total number of the stock issued in the market
varying degrees. (N ) is constant. We derive the traded price as
We formulate the model of investors who are
∑ (F ⋅ w ) P = N by calculating the price
M i i
t ti t
overconfident in their own predictions by assum- i =1
ing that they underestimate the risk of the stock. (P ), where the total amount of the stock retained
t
The risk of the stock estimated by an overconfident

((
by investors Fti ⋅ wti ) P ) meets the total market
( )
t
investor ss is calculated from the historical
i
value of the stock.

volatility (s ) and the adjustment factor to de-
h
termine the degree of overconfidence constant Natural Selection Rules
( ) ( ).
2 2
value k (k = 0.6) as s s = k sh Natural selection rules are equipped in the market
to represent the changes of cumulative excess
Calculation of Expected return for the most recent 5 terms (Takahashi,
Return Rate of the Stock Terano, 2003). The rules are divided into the
two steps: (1) appointment of investors who alter
The investors in the market predict the stock price their investment strategy, and (2) alteration of
( )
Pt +f 1 and the corporate profit ytf+1 at the term ( ) investment strategy. With the alteration of invest-
t+1 based on the corporate profit (yt ) at the term ment strategy, existence of investment strategy
alteration is decided based upon the most recent
t and the stock prices at and before the term t-1
performance of each 5 term period after 25 terms
(Pt −1, Pt −2 , Pt −3 ). In the following, we represent have passed since the beginning of market trans-
the predicted values of the stock price and the actions. In addition, for the simplicity, investors
corporate profit by the investor i (i = 1, 2, 3 ) are assumed to decide to alter investment strategy
as Pty+,i1 and ytf+,i1, respectively. The expected with the higher cumulative excess return over at
most recent 5 terms.
rate of return on the stock for the investor i rtint,
+1
i
( ) Using the models, we have investigated the
is calculated as follows: roles of over-confidence agents and passive and
rint,i 
( )
= rtim ⋅ c −1 ⋅ sts−1
−2
+r f ,i
( )
s
⋅ s
−2   −1
( )
 ⋅ c ⋅ st −1
s
−2
( )
+ sts−1
−2 −1
 ,
active strategies in the following two sections.
t +1
 t +1 t −1
  
((
where rt f+,i1 = Pt +f ,1i + ytf+,i1 Pt − 1 ⋅ 1 + eti and ) )( ) Over-Confidence Agents
will Survive in a Market
( )
2
rtim = 2λ σts−1 Wt −1 + rf
(Black,Litterman,1992). First, we have analyzed the initial forecasting
model ratio where there was (1) a higher ratio of
Determination of Trading Prices fundamental forecasting, and (2) a higher ratio of
trend forecasting. As the results of this analysis
The traded price of the stock is determined at the have suggested that these higher ratio strength
price the demand meets the supply (Arthur,1997). the degree of overconfidence in both cases, then,
Both of the investment ratio wti and the number ( ) we have analyzed the random distribution of the
initial ratio of each forecasting model to determine
138
whether the same result could be obtained under the average value of past equity prices. In contrast,
different conditions. The results of this analysis overconfident investors survive in the market
are explained in detail below. even when a random initial value is applied for
the degree of overconfidence (Figures 6 and 7).
Searching for Investment Strategies This interesting analysis result suggests the
possibility of universality when survival trends
(a) When there is a High Ratio of Fundamental of overconfident investors are compared with the
Forecasting forecasting model.
As fundamentalists enforce a strong influence Exploring Market Conditions

on the market value under these conditions, the
market value is almost in concord with the funda- This subsection explores the conditions in which
mental value (Figure 3). We have confirmed that transaction prices reach the fundamental values.
the number of fundamentalists is on the increase
due to the rules of natural selection in regard to the (a) Inverse Simulation Method
transition of investor numbers (Figure 4). Looking
at transition in the degree of overconfidence, the Inverse Simulation Analysis consists of the
degree of overconfidence strongly remains as the following four steps: (1) Carry out 100 times
simulation steps go forward (Figure 5). simulations with an investment period of 100 terms;
(2) Calculate the deviation between transaction
(b) When there is a High Ratio of Trend prices and the fundamental value for each simula-
Forecasting tion; (3) Select the simulation cases in which the
deviation value is small, and (4) Using recombina-
When there is a high ratio of investors us- tion operations of Genetic Algorithms, set the
ing trend forecasting, the market value deviated calculated index better ones, (5) repeat the process
greatly from the fundamental value. The number 100 simulation generations. The index (q ) of de-
of investors using trend forecasting also increases viation between transaction prices and the funda-
as such investors enforce a strong influence on mental value expresses the deviation ratio with the
the market value. There is the investment environ- fundamental value and is specifically calculated
ment, in which different forecasting methods were 2
applied to obtain excess return. On the other hand, asq = E x  + Var x  . Pt 0 represents the funda-
investors with a strong degree of overconfidence ( )
mental value x t = Pt − Pt 0 Pt 0 for time t.
survive in the market even under these conditions.
(b) Experimental Results
(c) When the Initial Ratio is Applied Randomly
The experimental results are summarized in
We have analyzed the case in which the initial Figures 8 and 9. In Figure 8, the transaction prices
ratio of investors is randomly applied. Although tend to coincide with the corresponding funda-
the case shown in Figure 6 indicates that a num- mental values, where there is a high percentage
ber of investors employ the strategy based on of fundamentalist investors. Also, Figure 9 shows
the fundamental values, the forecasting model that the number of fundamentalist investors are
employed by market investors is dependent on superior to the other kinds of investors. In addition,
the ratio of each type of investors, changing along
with circumstances such as trend forecasting and
139
Figure 3. Price transition
Figure 4. Transition of number of investors
Figure 5. Transition of average degree of overconfidence (Fundamentalist: Trend=500:500)
140
Figure 6. Transition of average number of overconfidence (Random)
Figure 7. Transition of average degree of investors (Random)
transaction prices almost match the fundamental hOW PASSIVE AND ACTIVE
value in this case. INVESTORS BEhAVE
Traditional finance argues that market sur-
vival is possible for those investors able to The series of experiments on the behaviors of
swiftly and accurately estimate both the risk and passive and active investors are divided into the
rate of return on stock, achieving market effi- two parts: First, we have fundamentalist agents
ciency. However, analysis results obtained here and passive-investment agents in the market to
regarding the influence irrational investors have investigate the influences of the two strategies.
on prices suggests a different situation, pointing Next, in order to analyze the effects, we introduce
to the difficulty of market modeling which takes the other kinds of investors, such as trend chasers,
real conditions into account.
141
Figure 8. Transition of average number of investors (Inverse Simulation)
Figure 9. Transition of average degree of overconfidence (Inverse Simulation)
loss over estimation investors, or overconfidence mark represents the fundamental value. Figure
investors. 11 depicts the histories of cumulative excess
returns of each investor. This graph shows that
Trading with Fundamentalist the fluctuation of the traded price agrees with the
and Passive Investors one of the fundamental value. The line with mark
x in Figure 11 shows the performance of passive
Figures 10 and 11 illustrate the case where there investment strategy and the dotted line shows the
exist the same 500 numbers of the two kinds of ones of fundamentalists. The performances of
investors (Case 0). Figure 10 shows the histo- fundamentalists are slightly different among them,
ries of stock prices. The solid line in Figure 10 because each of fundamentalists respectively has
represents the traded price and the line with x a predicting error. As the traditional asset pricing
142
Figure 10. Price transition(Case 0)
Figure 11. Cumulative excess returns(Case 0)
theory suggests, the trading prices are coincide and 15 are obtained by 100 experiments, each of
with the fundamental values and fundamentalist which consists of 3,000 simulation steps.
and passive investors can get the same profit in In Case 1, traded price changes in accordance
average. with fundamental value and both of investors
Next, using natural selection principles of coexist in the market. On the other hand, in Case
Genetic Algorithms (see the appendix 2 for detail), 2, traded price doesn’t reflect the fundamental
let the investor agents change their strategies when value and only passive investors can survive in
(1) the excess returns are under the target (e.g., the markets after around 1,600 time step. This
over 10%), (2) the excess returns are under 0%, result is quite different from the ones in Case 1.
and (3) the excess returns are too bad (e.g., under In Case 3, we have obtained the results similar to
10%). the ones in Case 2. These differences among each
The results of Case 1 are shown in Figures 12, experiment are brought about by the difference
13, 14, and 15. The results of Case 2 are shown of evaluation methods. In this sense, evaluation
in Figures 16, 17, 18, and 19. Figures 12, 13, 14, methods have a great influence on the financial
markets.
143
Figure 14. Distribution of fundamentalists
144
145
Throughout the experiments shown above, TRADING WITh

we have confirmed the effectiveness of passive FUNDAMENTALISTS, PASSIVE
investment strategy. Among them, the result in INVESTORS, AND OVER CONFIDENT
Figure 19 has indicated the superiority of passive INVESTORS, AND INVESTORS
strategy in more actual situation. However as is WITh PROSPECT ThEORY
shown in Case 2 and Figure 18, we have also
confirmed the unfavorable influence of passive This section describes the experimental results
investment on the market. with the five different investor agents: Funda-
mentalists, Passive Investors, Trend Chaser,
Investors with Prospect Theory, and Over Con-
146
Figure 20. Price transition (Case 4)
Figure 21. Cumulative excess returns (Case 4)
fident Investors. First, the results of Case 4 with want not to get the worst result in any cases, even
400 Fundamentalists, 400 trend chasers, and if they have failed to get the best result. In asset
200 passive investors are shown in Figures 20 management business, some investors adopt the
and 21. Second, the results of Case 5 with 400 passive investment to avoid getting the worst
Fundamentalists, 400 Over Confident Investors, performance.
and 200 passive investors are shown in Figures Figures 26 and 27 show the results where the
22 and 23. Third, the results of Case 6 with 400 agents are able to change their strategy when the
Fundamentalists, 400 Investors with Prospect excess returns are less than 0. In this experiment,
Theory, and 200 passive investors are shown in we have slightly modified the natural selection
Figures 24 and 25. rule as is described in previous section. In the
In all cases, we have observed that passive following experiments, investors change their
investors keep their moderate positions positive, strategy depending on their recent performance
even when stock prices largely deviate from the as is the same way in previous section and after
fundamental value. In other words, passive invest- that, investors change their strategy randomly in a
ment strategy is the most effective way if investors small possibility (0.01%) which is correspond to
147
Figure 22. Price transition (Case 4)
148
mutation in genetic algorithm. The result shown SUMMARY AND CONCLUSION

in Figure 27 suggests that there remain both fun-
damentalist and passive investors in the market This chapter has described our recent results on
and that they keep the market stable. The results Agent-Based Model to analyze both microscopic
shown in Figures 28 and 29 are quite different and macroscopic associations in the financial
from the ones in Case 2. These results suggest market. we have found that (1) investors’ overcon-
that even slight differences in the market condi- fidence plays various roles in a financial market,
tions and investors behavior could cause large (2) overconfident investors emerge in a bottom-
changes in the markets. In this sense, these results up fashion in a market, (3) they contribute to the
are thought-provoking. achievement of a market which adequately reflects
fundamental values, (4) the passive investment
149
Figure 27. Cumulative Excess Returns(Case 6)
Figure 28. Price Transition
Figure 29. Transition of Number of Investors
150
strategy is valid in a realistic efficient market, Brunnermeier, M. K. (2001). Asset Pricing under
however, it could have bad influences such as Asymmetric Information. Oxford University Press.
instability of market and inadequate asset pricing doi:10.1093/0198296983.001.0001
deviations, and (5) under certain assumptions, the
Epstein, J. M., & Axtell, R. (1996). Growing
passive investment strategy and active investment
Artificial Societies Social Science From the The
strategy could coexist in a financial market. These
Bottom Up. MIT Press.
results have been described in more detail else-
where (e.g., Takahashi, Terano 2003, 2004, 2006a, Fama, E. (1970). Efficient Capital Markets: A Re-
2006b, Takahashi, Takahashi, Terano 2007). Us- view of Theory and Empirical Work. The Journal
ing a common simple framework presented in of Finance, 25, 383–417. doi:10.2307/2325486
the chapter, we have found various interesting
Friedman, M. (1953). Essays in Positive Econom-
results, which may or may not coincide with both
ics. University of Chicago Press.
financial theory and real world phenomena. We
believe that the agent-based approach would be Goldberg, D. (1989). Genetic Algorithms in
fruitful for the future research on social systems Search, Optimization, and Machine Learning.
including financial problems, if we would continue Addison-Wesley.
the effort to convince the effectiveness (Terano,
Kahneman, D., & Tversky, A. (1979). Prospect
2007a, 2007b; Takahashi, 2010).
Theory of Decisions under Risk. Econometrica,
47, 263–291. doi:10.2307/1914185
REFERENCES Kahneman, D., & Tversky, A. (1992). Advances
in. prospect Theory: Cumulative representation of
Arthur, W. B., Holland, J. H., LeBaron, B., Palmer, Uncertainty. Journal of Risk and Uncertainty, 5.
R. G., & Taylor, P. (1997). Asset Pricing under
Endogenous Expectations in an Artificial Stock Kyle, A. S., & Wang, A. (1997). Speculation
Market. [Addison-Wesley.]. The Economy as an Duopoly with Agreement to Disagree: Can Over-
Evolving Complex System, II, 15–44. confidence Survive the Market Test? The Journal
of Finance, 52, 2073–2090. doi:10.2307/2329474
Axelrod, R. (1997). The Complexity of Coop-
eration -Agent-Based Model of Competition and Levy, M., Levy, H., & Solomon, S. (2000).
Collaboration. Princeton University Press. Microscopic Simulation of Financial Markets.
Academic Press.
Axtell, R. (2000). Why Agents? On the Varied
Motivation For Agent Computing In the Social Markowitz, H. (1952). Portfolio Selection. The
Sciences. The Brookings Institution Center on Journal of Finance, 7, 77–91. doi:10.2307/2975974
Social and Economic Dynamics Working Paper, Modigliani, F., & Miller, M. H. (1958). The Cost
November, No.17. of Capital, Corporation Finance and the Theory
Bazerman, M. (1998). Judgment in Managerial of Investment. The American Economic Review,
Decision Making. John Wiley & Sons. 48(3), 261–297.
Black, F., & Litterman, R. (1992, Sept/Oct). Global Russell, S., & Norvig, P. (1995). Artificial Intel-
Portfolio Optimization. Financial Analysts Jour- ligence. Prentice-Hall.
nal, 28–43. doi:10.2469/faj.v48.n5.28
151
Sharpe, W. F. (1964). Capital Asset Prices: A Terano, T. (2007a). Exploring the Vast Parameter
Theory of Market Equilibrium under condition Space of Multi-Agent Based Simulation. In L.
of Risk. The Journal of Finance, 19, 425–442. Antunes & K. Takadama (Eds.), Proc. MABS
doi:10.2307/2977928 2006 (LNAI 4442, pp. 1-14).
Shiller, R. J. (2000). Irrational Exuberance. Terano, T. (2007b). KAIZEN for Agent-Based
Princeton University Press. Modeling. In S. Takahashi, D. Sallach, & J.
Rouchier (Eds.), Advancing Social Simulation
Shleifer,A. (2000). Inefficient Markets. Oxford Uni-
-The First Congress- (pp. 1-6). Springer Verlag.
versity Press. doi:10.1093/0198292279.001.0001
Terano, T., Deguchi, H., & Takadama, K. (Eds.).
Takahashi, H. (2010), “An Analysis of the Influ-
(2003), Meeting the Challenge of Social Problems
ence of Fundamental Values’ Estimation Accuracy
via Agent-Based Simulation: Post Proceedings of
on Financial Markets, ” Journal of Probability
The Second International Workshop on Agent-
and Statistics, 2010.
Based Approaches in Economic and Social
Takahashi, H., Takahashi, S., & Terano, T. (2007). Complex Systems. Springer Verlag.
Analyzing the Influences of Passive Investment
Terano, T., Nishida, T., Namatame, A., Tsumoto,
Strategies on Financial Markets via Agent-Based
S., Ohsawa, Y., & Washio, T. (Eds.). (2001). New
Modeling . In Edmonds, B., Hernandez, C., &
Frontiers in Artificial Intelligence. Springer Ver-
Troutzsch, K. G. (Eds.), Social Simulation- Tech-
lag. doi:10.1007/3-540-45548-5
nologies, Advances, and New Discoveries (pp.
224–238). Hershey, PA: Information Science Tesfatsion, L. (2002). Agent-Based Computational
Reference. Economics. Economics Working Paper, No.1,
Iowa Sate University.
Takahashi, H., & Terano, T. (2003). Agent-Based
Approach to Investors’ Behavior and Asset Price
Fluctuation in Financial Markets. Journal of
Artificial Societies and Social Simulation, 6(3). ENDNOTES
Takahashi, H., & Terano, T. (2004). Analysis 1
Graduate School of Business Administration,
of Micro-Macro Structure of Financial Markets Keio University, 4-1-1 Hiyoshi, Yokohama,
via Agent-Based Model: Risk Management and 223-8526, Japan, E-mail: [email protected].
Dynamics of Asset Pricing. Electronics and Com- ac.jp
munications in Japan, 87(7), 38–48. 2
Department of Computational Intelligence
Takahashi, H., & Terano, T. (2006a). Emergence and Systems Science, Tokyo Institute of
of Overconfidence Investor in Financial markets. Technology, 4259-J2-52 Nagatsuta-cho,
5th International Conference on Computational Midori-ku, Yokohama, 226-8502, Japan,
Intelligence in Economics and Finance. E-mail: [email protected]
Takahashi, H., & Terano, T. (2006b). Exploring

Risks of Financial Markets through Agent-Based
Modeling. In Proc. SICE/ICASS 2006 (pp. 939-
942).
152
APPENDICES
1. List of Parameters of the Proposed Model
The parameters used in the proposed model are summarized as follows:
M: the number of investors (1,000)
N: the number of issued stocks (1,000)
Fti :the total amount of assets of the investor i at the term t ( F0i =2,000:common)
Wt :the stock ratio in the market at the term t (W0 =0.5)
wti :the investment ratio of the stock of the investor i at the term t (w 0i =0.5:constant)
sy :the standard deviation of the profit fluctuation( 0.2 200 :constant)
d :the discount rate of the stock (0.1/200:constant)
l :the degree of risk aversion of the investor(1.25:common,constant)
c: the adjustment coefficient for variance(0.01)
sth :the historical volatility of the stock (for the recent 100 terms)
sn :the standard deviation of the dispersion of the short term expected rate of return on the stock
(0.01:common)
k: the adjustment coefficient for confidence (0.6)
153
2. Rules of Natural Selection Principle
This section explains the rules of natural selection principle. The principle used in this chapter is com-
posed of two steps: (1) selection of investors who change their investment strategies and (2) selection
of new strategy. Each step is described in the following sections:
Selection of Investors Who Change Their Investment Strategies
After 25 terms pass since the market has started, each investor makes decision at regular interval (every
five terms) whether he/she changes the strategy. The decision is made depending on the cumulative ex-
cess return during the recent five terms and the investors who obtain smaller return changes the strategy
at higher probability. To be more precise, the investors who obtain negative cumulative excess return
changes the strategy at the following probability:
( r cum
pi = max 0.3 − a ⋅ e i , 0 , )
pi : probability at which investor i changes own strategy,
ricum : cumulative return of investor i during recent 5 terms,
a: the coefficient for the evaluation criteria(0.2,0.3,0.4).
Selection of New Strategy
We apply the method of genetic algorithm (Goldberg (1989)) to the selection rule of new strategy. The
investors who change the strategy tend to select the strategy that has brought positive cumulative excess
r cum rjcum
∑ j =1 e
M
return. The probability to select si as new strategy is given as: pi = e i , where ricum is
the cumulative excess return of each investor.
154
Section 4
156
Chapter 9
Autonomous Specialization in
a Multi-Robot System using
Evolving Neural Networks
Masanori Goka
Hyogo Prefectural Institute of Technology, Japan
Kazuhiro Ohkura
Hiroshima University, Japan
ABSTRACT
Artificial evolution has been considered as a promising approach for coordinating the controller of an
autonomous mobile robot. However, it is not yet established whether artificial evolution is also effective
in generating collective behaviour in a multi-robot system (MRS). In this study, two types of evolving
artificial neural networks are utilized in an MRS. The first is the evolving continuous time recurrent neu-
ral network, which is used in the most conventional method, and the second is the topology and weight
evolving artificial neural networks, which is used in the noble method. Several computer simulations
are conducted in order to examine how the artificial evolution can be used to coordinate the collective
behaviour in an MRS.
INTRODUCTION EANNs is called evolutionary robotics (ER) (Cliff,

et al., 1993) (Harvey, et al., 2005).
Artificial evolution is one of the emerging ap- There has been a great deal of interest in
proach in the design of controllers for autonomous EANNs. A good summary of EANNs carried out
mobile robots. In general, a robot controller, which until 1999 can be found in a study of Yao (1999).
is represented as an evolving artificial neural Traditionally, EANNs have been classified into
network (EANN), is evolved in a simulated or the following three categories on the basis of their
a physical environment such that it exhibits the network structure:
behaviour required to perform a certain task.
The field of research on autonomous robots with • the network structure is fixed and the con-
nection weights evolve.
• the network structure evolves and the con-
DOI: 10.4018/978-1-60566-898-7.ch009 nection weights are trained by learning.
Autonomous Specialization in a Multi-Robot System using Evolving Neural Networks
Figure 1. Cooperative package pushing problem

• the network structure and the connection
weights evolve simultaneously
The first type of network structure corresponds

to the most conventional approach in the field of
EANNs.
A typical application is presented in a study of
Mondada and Floreano (1995). Recently, continu-
ous time recurrent neural networks (CTRNNs)
have been frequently used for evolving autono-
mous robots (Beer, 1996) (Blynel and Floreano,
2003).
In this chapter, we also consider the network simple two-layered feed-forward EANNs as robot
structure of the third type, which are called topol- controllers and a binary Genetic Algorithms (GA)
ogy and weight evolving artificial neural networks in which a real value is encoded with eight bits
(TWEANNs) because the evolvability in the cor- with their original reproduction procedure.
responding approach is the largest among those Quinn et al. (2001) evolved a physical robot
of the tree approaches (Stanley and Miikulainen, group by adopting the so-called Sussex approach
2002). Thus far, many TWEANN approaches such (Harvey and Husbands, 1997). Specifically, they
as GNARL (Angeline et al., 1994), EPnet (Liu and first evolved the robot controllers in a simulated
Yao, 1996), ESP (Gomez and Miikkulainen, 1999) environment, and then, after achieving sufficiently
and NEAT (Stanley and Miikulainen, 2002) have good performance, they conducted physical ex-
been proposed. We are motivated by their work periments on the evolved controllers. They used
and also developed a robust and efficient approach their original recurrent EANNs as robot controllers
to EANN design, called MBEANN (Ohkura et and a steady-state real-coded GA.
al., 2007). In the remainder of this chapter, we In this chapter, we solve the task as in Figure
deal with the following two approaches: evolving 1: ten autonomous mobile robots have to push
CTRNN (eCTRNN) and MBEANN. three packages to the goal line, which is drawn
The evolutionary control of multi-robot sys- on the right side of the floor. The packages are
tems (MRS) has not been well discussed. Only a assumed to be too heavy for a single robot to
few papers have been published thus far; however, move. In order to move a package, the robot has
some basic trail results such as those presented to cooperate with each other to gather at a certain
in a study of Triani (2007) or Acerbi (2007) have side of a package and push it in the same direction,
recently been published. This implies that although as illustrated in Figure 2.
artificial evolution is a promising approach in This chapter is organized as follows. In the
behaviour coordination for a single robot, it is next section, the task and our objectives are in-
not very easy to evolve an MRS for developing troduced and described in detail. In Section 3, the
a type of adaptively specialized team play. computer simulations performed for eCTRNN
Here, we mention two epochal papers. Baldas- and MBEANN are described. The achieved be-
sarre et al. (2003) evolved four simulated Khepera haviour is discussed from the viewpoint of au-
robots to develop a flocking formation and then tonomous specialization. Then, the appear is
discussed the variation in the formation patterns concluded in the last section.
with changes in the fitness function. They used
157
Figure 2. Package being pushed by robots Table 1. Profit table
Contact between robot and package +100

Package’s final position +x
Successful transport +1000
Steps required -step
Collision -P
controller. The output layer has two neurons,

which corresponds to the left and right motors. The
details on the neural controller are shown below.
All the robots are assumed to have identical
ANN controller. The behaviour of each robot will
be different because the robots are placed at dif-
ferent positions. A group of robots collects points
COOPERATIVE PACKAGE as shown in Table 1. A group is awarded gets 100
PUShING PROBLEM points when any of the robots touches one of the
packages. The group is awarded points for mov-
Figure 1 shows the cooperative package pushing ing packages according to the distances between
problem involving ten autonomous mobile robots. the initial positions and the final positions in the
In the initial state, ten robots are placed at equal direction of the goal line. The group is awarded
interval on the left side of the floor. The objec- 1,000 points each time the group successfully
tive is to push all the three packages, which are pushes a package to the goal line.
arranged as shown, to the right side of the floor On the other hands, a group loses the points
within 2,000 time steps. as the number of time steps required increases. It
It is assumed that the three packages have can also lose points when robots collide with each
weights such that they cannot be moved by a other. In the following computer simulations, the
single robot. To move the heaviest package, which collision penalty is set as P = 0 and P = 5.0 × 10-6
is the largest of the package shown in Figure 1, = ∆P because it was found in preliminary ex-
five robots are required. periments that the behaviour of robot groups seems
Further, it is obvious that the five robots have to be very sensitive to P, i.e., the groups show
to push the package in the same direction to move very different results for slight change in P.
it. Similarly, it is also assumed that to move the
other two packages, three and two robots, respec-
tively, are required. COMPUTER SIMULATIONS
The ten autonomous robots have the same
specifications, as shown in Figure 2. A robot eCTRNN
has six IR sensors on its front, two IR sensors
in its back, an omnidirectional vision camera at The eCTRNN robot controller has an input layer
the center of its body, and a compass in order to consisting of 16 neurons: eight neurons correspond
detect its global direction. A robot also has two to eight IR sensors, three neurons are dedicated
wheels for movement. The informat-ion obtained to the package nearest to the robot in terms of the
by these sensors serves as the inputs for the ANN polar representation, three neurons are dedicated
158
Figure 3. Specifications of a robot Table 2. Success rates in eCTRNN
Penalty (-P/collision) Success rate [%]

0.0 90
5.0 × 10 = ∆P
-6
30
Figure7 shows the transitions of artificial

evolutions observed from the fitness values, the
number of required steps, and finished packages.
In the figure, (a) and (b) show the transitions of
fitness values for all the runs. In the case where
P = 0, the robot group succeeded in transporting
to the robot nearest in terms of the polar represen-
all the three packages in nine runs. On the other
tation, and two neurons are used for representing
hand, in the case where P = ∆P, the robot group
the global direction, as shown in Figure 3. The
succeeded only in three runs. The gray lines in
output layer consists of two neurons, each of which
both the graphs show the average fitness transi-
connects to the left and right motors, respectively,
tions calculated using only the number of suc-
as shown in the same figure. In this study, the hid-
cessful runs. Figure 7(c) and (d) show the transi-
den layer is assumed to have four neurons, each
tions of the number of required time steps in the
of which has a synaptic connection with all the
successful runs. The light gray lines show the
neurons in the layer. Consequently, each neuron
averages of nine and three successful runs, re-
has a recurrent connection to itself.
spectively.
We soon found that when P = 0, the robot
Experimental Results
group showed stable performance in all the suc-
cessful runs after about the 400th generation.
The eCTRNN was evolved by using (10, 70)-ES,
Similarly, when P =∆P, stable performance was
i.e., by using the standard evolution strategies
observed in two of three runs at around the 250th
(Bäck, 1996) in which 10 individuals generate 70
generation; further, in one run did not perform
offspring with an equal probability and only the
stably until the last generations. In Figure 7(e)
top ten offspring are allowed to survive in the next
and (f), indicate whether the three packages were
generation. Ten independent runs were conducted.
transported to the goal line within the time-step
The results of the computer simulations are
limit. We also soon found that the robot group
summarized in Table 2. The success rates clearly
pushed the three packages to the goal line after
indicate that the task is more difficult when P =
around 100 and 150 generation when P = 0 and
∆ P than when P = 0.
when P =∆P, respectively.
Figure 5 and Figure 6 show the collective
Next, let us investigate a typical run to deter-
behaviour of the robot group in the last generation
mine the strategy that the robot group developed
of a successful run when P = 0 or P = ∆P, respec-
by artificial evolution. Figure 8 shows graphs for
tively. The robots showed almost the same col-
ten robots at the 500th generation in the cases
lective behaviour for the two value of P: they
when P = 0 and
pushed packages towards the lower right region,
P =∆P. Both the graphs have ten lines, each
presumably to avoid collisions between pack-
of which corresponds to a robot. The horizontal
ages.
axes represent time steps, which begin at 0 and
159
Figure 4. The eCTRNN robot controller
end at 2,000. A black dot is marked when a robot MBEANN

is in contact with the largest package. Similarly,
a gray dot or a light gray dot is marked when a Recently, our research group proposed a novel
robot is in contact with a midsize package or the EANN for TWEANNs; this EANN is called
smallest package, respectively. As observed in MBEANN (Ohkura et al., 2007).
Figure 8(a), the robots showed highly sophisti- Here, the MBEANN is briefly introduced
cated team play to successfully and pushed almost for an understanding of some of its important
simultaneously push all the packages to the goal characteristics.
line. Robots No.1 and No.2 pushed the smallest The MBEANN is a type of TWEANN in which
package, Robots No.3, 4, and 5 pushed the midsize no crossover is used. Instead, the MBEANN adopts
package, and the others pushed the largest pack- two types of structural mutations that work neu-
age to the goal line. The robots not only pushed trally or almost neutrally in term of the fitness
the three packages but also coordinated their value. In order to utilize these special structural
behaviour so that the packages did not collide mutations, an individual is represented as a set of
with each other. Figure 8(b) illustrates similar subnetworks, i.e., as a set of modules called op-
results that were obtained in the case where P erons. Figure 9 shows the concept of the genotype.
=∆P. In the last generation, the robots achieved As shown in the figure, the node information
almost the same team play as in the case where consists of the node type and the node identifica-
P = 0. tion number. The link information consists of the
input node, the output node, the weight value, and
the link identification number. The two identifica-
tion numbers should be unique only to each indi-
160
Figure 5. The achieved collective behavior when P = 0 for eCTRNN
161
Figure 6. The achieved collective behavior when P = ∆P for eCTRNN
162
Figure 7. The experimental results when P = 0 (the left three graphs) and P = ∆P (the right three graphs)
for eCTRNN
163
Figure 8. The achieved collective behavior by eCTRNN at the last generation
Figure 9. Conceptual representation of genotype for MBEANN
164
Figure 10. Initial controller for MBEANN
vidual. Thus, if I is the maximum number of Table 3. Parameters for MBEANN

operons, a genotype string is formulated as fol-
add node rate 0.01
lows:
add connection rate 0.1
weight mutation rate 1.0
string = {operon0, operon1, ...,
operonI} variance for weight mutation 0.05
tournament size 3
operoni= {{nodej|j∈ON },
αin sigmoid function
i
0.5
{linkk|k∈OL }} βin sigmoid function 5.0
i
weight value in add node 1.0

ON OL max generation 500
i i
As for an initial population, since MBEANN

starts with the population consisting of the initial Experimental Results
individuals having only one operon, operon0, i.e.,
the minimal structure in which there is no hidden Similar to the evolution of the eCTRNN in the
node, as shown in Figure 10. This is identical to the experiments, the MBEANN was evolved using
case of NEAT (Stanley and Miikkulainen, 2002). (10, 70)-ES. The conditions in the computer simu-
lations were the same as those in the eCTRNN ex-
periments. Ten independent runs were conducted
for the two cases where P = 0.0 and P = ∆P. The
results are summarized in Table 4. We found that at
165
Table 4. Success rate in MBEANN

In the case where P = 0, the robot group suc-
Penalty (-P × step) Success rate [%] ceeded in transporting all the three packages at
0.0 80
eight runs.
∆P 30
In the other case (P = ∆P), the robot group
succeeded only in three runs. The black lines in
both the graphs show the average fitness transi-
tions calculated using only the successful runs.
the last generation, the MBEANN showed almost Figure 13 (c) and (d) show the transitions of the
the same success rate as the eCTRNN . required time steps in the successful runs. The
Figure 11 and Figure 12 show the collective gray lines show the averages of eight and three
behaviour of the robot group at the final genera- successful runs, respectively. We soon found
tion in that generally, the performance of the robot was
a successful run when P = 0 or P = ∆P, re- poorer than in the case of eCTRNN, particularly
spectively. In the case where P = 0, the robot in the early generations. When P = 0, the robot
group showed almost the same collective behav- group did not show stable performance in all the
iour as observed in the cases of eCTRNN. The successful runs until the last few generations. On
robots pushed the packages toward the lower right the contrary, when P = ∆P, very stable perfor-
region. On the other hand, in the case where P = mance was observed in the three successful runs
∆P, it was observed that the robots used com- after around the 270th generation. Figure 13(e)
pletely different strategy. First, they found a and (f) indicate whether the three packages were
partner to move around. After making five pairs, transported to the goal line within the time-steps
one of them moved toward the smallest package limit. We also soon found that the robot group
in an almost straight line, even though they could stably pushed the three packages to the goal line
not see the smallest package, because the neural after around the 420 and 270 generations when
controller of the robot received the sensory infor- P = 0 and when P = ∆P, respectively.
mation only for the nearest package from the We suppose that the initial topological structure
omnivision camera. In addition, two of the robots was too simple for evolutionary learning to solve
started moving toward midsized package. During this complex task, and thus, the robot group needed
the movement toward the package, they con- around 200 generations to obtain a sufficient
tacted with the largest package; one of them re- topological complexity, i.e., the ability to solve
mained at this location. The other three robots this task. This might be validated by Figure 15 or
went on to push the midsized package toward the Figure 16, which shows the structural evolution
goal line. One robot and the remaining two pairs of the robot’s neural controller in a typical run. As
of robots started to pushing the largest package indicated by figure (a) and (b), which show the
toward the goal line. This highly sophisticated controller at the 100th generation, or (c) and (d),
collective behavior is the results of ``autonomous which show the controller at the 300 generation,
specialization’’. in comparison to the case of the eCTRNN robot
Figure 13 shows the transitions of artificial controller, the neural network seems too simple
evolutions observed from the fitness values, re- to solve the task. The topological structures at
quired steps, and finished packages. In the figure, the last generation are shown in (e) and (f) in the
(a) and (b) show the transitions of the fitness same figure. The controllers might be sufficiently
values in all the runs. complex.
Next, similar to the cases of eCTRNN, let us
investigate a typical run to determine the strategy
166
Figure 11. The achieved collective behavior when P = 0 for MBEANN
167
Figure 12. The achieved collective behavior when P = ∆P for MBEANN
168
Figure 13. The experimental results when P = 0 (the left three graphs) and P= ∆P (the right three
graphs) for MBEANN
169
Figure 14. The achieved collective behavior by MBEANN at the last generation
Figure 15. A typical result of the structural of the neural network controller for MBEANN
170
Figure 16. A typical result of the structural of the neural network controller for MBEANN
the robot group developed by artificial evolution. in a straight line, Robot No.3, No.4, and No.9
Figure 14 shows graphs for ten robots at the 500th pushed the midsize package, and the remaining
generation in the cases where P = 0 and P = ∆P. five robots pushed the largest package to the goal
These graphs are drawn in a manner similar to line.
those in Figure 8. As found in graph (a), the robot
groups showed almost the same team play as in
the cases of eCTRNN. On the other hand, as CONCLUSION
explained above, the effect of autonomous spe-
cialization is clearly observed in the graph (b). In this chapter, two approaches in EANN, called
Robot No.1 and No.2 pushed the smallest package eCTRNN and MBEANN, were discussed in the
171
context of the cooperative package pushing prob- Blynel, J., & Floreano, D. (2003). Exploring the
lem. Ten autonomous mobile robots successfully T-Maze: Evolving Learning-Like Robot Behav-
showed sophisticated collective behaviour as a iors using CTRNNs. In Proceedings of the 2nd
result of autonomous specialization. European Workshop on Evolutionary Robotics
As the next step, the extension of the algorithm (EvoRob’2003) (LNCS).
for artificial evolution in order to improve the
Cliff, D., Harvey, I., & Husbands, P. (1993). Explora-
success rate must be considered. Second, more
tions in Evolutionary Robotics . Adaptive Behavior,
computer simulations should be performed in
2(1), 71–104. doi:10.1177/105971239300200104
order to examine the validity and the robustness
of the ER approach in coordinating cooperative Gomez, F. J. and Miikkulainen, R. (1999). Solv-
behaviour in an MRS since the complexity of ing Non-Markovian Control Tasks with Neuro-
the cooperative package pushing problem can be evolution, In Proceedings of the International
easily varied by changing the number of robots Joint Conference on Artificial Intelligence (pp.
or the arrangement of the packages. 1356-1361).
Harvey, I., Di Paolo, E., Wood, A., & Quinn,
R., M., & Tuci, E. (2005). Evolutionary Ro-
REFERENCES
botics: A New Scientific Tool for Studying
Acerbi, A., et al. (2007). Social Facilitation on the Cognition. Artificial Life, 11(3/4), 79–98.
Development of Foraging Behaviors in a Popula- doi:10.1162/1064546053278991
tion of Autonomous Robots. In Proceedings of Harvey, I., Husbands, P., Cliff, D., Thompson,
the 9th European Conference in Artificial Life A., & Jakobi, N. (1997). Evolutionary robotics:
(pp. 625-634). The sussex approach. Robotics and Autonomous
Angeline, P. J., Sauders, G. M., & Pollack, J. Systems, 20, 205–224. doi:10.1016/S0921-
B. (1994). An evolutionary algorithms that 8890(96)00067-X
constructs recurrent neural networks. IEEE Liu, Y., & Yao, X. (1996). A Population-Based
Transactions on Neural Networks, 5, 54–65. Learning Algorithms Which Learns Both Ar-
doi:10.1109/72.265960 chitectures and Weights of Neural Networks.
Bäck, T. (1996). Evolutionary Algorithms in Chinese Journal of Advanced Software Research,
Theory and Practice: Evolution Strategies, Evo- 3(1), 54–65.
lutionary Programming, Genetic Algorithms. Mondada, F., & Floreano, D. (1995). Evolution
Oxford University Press. of neural control structures: Some experiments
Baldassarre, G., Nolfi, S., & Parisi, D. (2003). on mobile robots. Robotics and Autonomous
Evolving Mobile Robots Able to Display Collec- Systems, 16(2-4), 183–195. doi:10.1016/0921-
tive Behaviours . Artificial Life, 9(3), 255–267. 8890(96)81008-6
doi:10.1162/106454603322392460 Ohkura, K., Yasuda, T., Kawamatsu, Y., Matsumu-
Beer, R. D. (1996). Toward the Evolution of ra, Y., & Ueda, K. (2007). MBEANN: Mutation-
Dynamical Neural Networks for Minimally Cog- Based Evolving Artificial Neural Networks. In
nitive. In From Animals to Animats 4: Proceed- Proceedings of the 9th European Conference in
ings of the Fourth International Conference on Artificial Life (pp. 936-945).
Simulation of Adaptive Behavior (pp. 421-429).
172
Quinn, M., & Noble, J. (2001). Modelling Animal Triani, V., et al. (2007). From Solitary to Collective
Behaviour in Contests: Tactics, Information and Behaviours: Decision Making and Cooperation,
Communication. In Advances in Artificial Life: In Proceedings of the 9th European Conference
Sixth European Conference on Artificial Life in Artificial Life (pp. 575-584).
(ECAL 01), (LNAI).
Yao, X. (1999). Evolving artificial networks.
Stanley, K., & Miikkulainen, R. (2002). Evolving Proceedings of the IEEE, 87(9), 1423–1447.
neural networks through augmenting topologies doi:10.1109/5.784219
. Evolutionary Computation, 10(2), 99–127.
doi:10.1162/106365602320169811
173
174
Chapter 10
A Multi-Robot System
Using Mobile Agents with
Ant Colony Clustering
Yasushi Kambayashi
Yasuhiro Tsujimura
Hidemi Yamachi
Munehiro Takimoto
Tokyo University of Science, Japan
ABSTRACT
This chapter presents a framework using novel methods for controlling mobile multiple robots directed
by mobile agents on a communication networks. Instead of physical movement of multiple robots, mobile
software agents migrate from one robot to another so that the robots more efficiently complete their
task. In some applications, it is desirable that multiple robots draw themselves together automatically.
In order to avoid excessive energy consumption, we employ mobile software agents to locate robots
scattered in a field, and cause them to autonomously determine their moving behaviors by using a clus-
tering algorithm based on the Ant Colony Optimization (ACO) method. ACO is the swarm-intelligence-
based method that exploits artificial stigmergy for the solution of combinatorial optimization problems.
Preliminary experiments have provided a favorable result. Even though there is much room to improve
the collaboration of multiple agents and ACO, the current results suggest a promising direction for the
design of control mechanisms for multi-robot systems. In this chapter, we focus on the implementation
of the controlling mechanism of the multi-robot system using mobile agents.
DOI: 10.4018/978-1-60566-898-7.ch010
A Multi-Robot System Using Mobile Agents with Ant Colony Clustering
INTRODUCTION a multi-agent system that exploits artificial stig-

mergy for the solution of combinatorial optimiza-
When we pass through the terminals of an airport, tion problems. Preliminary experiments yield a
we often see carts scattered in the walkways and favorable result. Ant colony clustering (ACC) is
laborers manually collecting them one by one. an ACO specialized for clustering objects. The
It is a dull and laborious task. It would be much idea is inspired by the collective behaviors of
easier if carts were roughly gathered in any way ants, and Deneubourg formulated an algorithm
before collection. Multi-robot systems have that simulates the ant corps gathering and brood
made rapid progress in various fields, and the sorting behaviors (Deneuburg, Goss, Franks,
core technologies of multi-robot systems are now Sendova-Franks, Detrain & Chretien, 1991).
easily available (Kambayashi & Takimoto, 2005). We have studied a few applications of the base
Employing one of those technologies, it is possible idea for controlling mobile multiple robots con-
to give each cart minimum intelligence, making nected by communication networks (Ugajin, Sato,
each cart an autonomous robot. We realize that Tsujimura, Yamamoto, Takimoto & Kambayashi,
for such a system cost is a significant issue and 2007; Sato, Ugajin, Tsujimura, Yamamoto & Kam-
we address one of those costs, the power source. bayashi, 2007; Kambayashi, Tsujimura, Yamachi,
A powerful battery is big, heavy and expensive; Takimoto & Yamamoto, 2009). The framework
therefore for such intelligent cart systems small provides novel methods to control coordinated
batteries are desirable since energy saving is systems using mobile agents. Instead of physical
an important issue in such a system (Takimoto, movement of multiple robots, mobile software
Mizuno, Kurio & Kambayashi, 2007; Nagata, agents can migrate from one robot to another
Takimoto & Kambayashi, 2009). so that they can minimize energy consumption
To demonstrate our method, we consider the in aggregation. In this chapter, we describe the
problem of airport luggage carts that are picked details of implementation of the multi-robot
up by travelers at designated points and left in system using multiple mobile agents and static
arbitrary places (Kambayashi, Sato, Harada, & agents that implement ACO. The combination of
Takimoto, 2009). It is desirable that intelligent the mobile agents augmented by ACO and mobile
carts (intelligent robots) draw themselves together multiple robots opens a new horizon of efficient
automatically to make their collection more ef- use of mobile robot resources. We report here our
ficient. A simple implementation would be to give experimental observations of our simulation of
each cart a designated assembly point to which our ACC implementation.
it automatically returns when free. It is easy to Quasi-optimal robot collection is achieved in
implement, but some carts would have to travel a three phases. The first phase collects the positions
long way back to their assigned assembly point, of robots. One mobile agent issued from the host
even if they found themselves located close to computer visits scattered robots one by one and
another assembly point. The additional distance collects the position of each. The precise coordi-
traveled consumes unnecessary energy since the nates and orientation of each robot are determined
carts are functionally identical. by sensing RFID (Radio Frequency Identification)
To ameliorate the situation, we employ mobile tags under the floor carpet. Upon the return of the
software agents to locate robots scattered in a field, position collecting agent, the second phase begins
e.g. an airport, and make them autonomously deter- wherein another agent, the simulation agent,
mine their moving behavior using a clustering al- performs the ACC algorithm and produces the
gorithm based on ant colony optimization (ACO). quasi-optimal gathering positions for the robots.
ACO is a swarm intelligence-based method and The simulation agent is a static agent that resides in
175
the host computer. In the third phase, a number of running. Dynamic extension of control software
mobile agents are issued from the host computer. by the migration of mobile agents enables the
Each mobile agent migrates to a designated robot, controlling agent to begin with relatively simple
and directs the robot to the assigned quasi-optimal base control software, and to add functionalities
position that was calculated in the second phase. one by one as it learns the working environment.
The assembly positions (clustering points) are Thus we do not have to make the intelligent robot
determined by the simulation agent. It is influ- smart from the beginning or make the robot learn
enced, but not determined, by the initial positions by itself. The controlling agent can send intelli-
of scattered robots. Instead of implementing ACC gence later through new agents. Even though the
with actual robots, one static simulation agent dynamic extension of the robot control software
performs the ACC computation, and then the set using the higher order mobile agents is extremely
of produced positions is distributed by mobile useful, such a higher order property is not neces-
agents. Therefore our method eliminates un- sary in our setting. We have employed a simple,
necessary physical movement and thus provides non-higher-order mobile agent system for our
energy savings. framework. We previously implemented a team
The structure of the balance of this chapter is of cooperative search robots to show the effec-
as follows. In the second section, we review the tiveness of such a framework, and demonstrated
history of research in this area. The third section that that framework contributes to energy savings
describes the agent system that performs the for a task achieved by multiple robots (Takimoto,
arrangement of the multiple robots. The fourth Mizuno, Kurio & Kambayashi, 2007; Nagata, Ta-
section describes the ACC algorithm we have kimoto & Kambayashi, 2009). We have employed
employed to calculate the quasi optimal assem- a simple non-high-order mobile agent system for
bly positions. The fifth section demonstrates the our framework. Our simple agent system should
feasibility of our system by implementing an achieve similar performance.
actual multi-robot orients itself using RFID tags Deneuburg formulated the biologically in-
in its environment. The sixth section discusses spired behavioral algorithm that simulates the ant
our quantitative experiments and our observa- corps gathering and brood sorting behaviors (De-
tions from the preliminary experiments. Finally, neuburg, Goss, Franks, Sendova-Franks, Detrain,
we conclude in the seventh section and discuss & Chretien, 1991). His algorithm captured many
future research directions. features of the ant sorting behaviors. His design
consists of ants picking up and putting down ob-
jects in a random manner. He further conjectured
BACKGROUND that robot team design could be inspired from the
ant corps gathering and brood sorting behaviors
Kambayashi and Takimoto have proposed a (Deneuburg, Goss, Franks, Sendova-Franks,
framework for controlling intelligent multiple Detrain, & Chretien, 1991). Wang and Zhang
robots using higher-order mobile agents (Kam- proposed an ant inspired approach along this line
bayashi & Takimoto, 2005; Takimoto, Mizuno, of research that sorts objects with multiple robots
Kurio & Kambayashi, 2007; Nagata, Takimoto & (Wang & Zhang, 2004).
Kambayashi, 2009). The framework helps users to Lumer improved Deneuburg’s model and
construct intelligent robot control software using proposed a new simulation model that was called
migration of mobile agents. Since the migrating Ant Colony Clustering (Lumer, & Faieta, 1994).
agents are of higher order, the control software His method could cluster similar object into a few
can be hierarchically assembled while they are groups. He presented a formula that measures the
176
similarity between two data objects and designed On the other hand, excessive interactions
an algorithm for data clustering. Chen et al have among agents in the multi-agent system may
further improved Lumer’s model and proposed cause problems in the multi-robot environment.
Ants Sleeping Model (Chen, Xu & Chen, 2004). Consider a multi-robot system where each robot
The artificial ants in Deneuburg’s model and is controlled by an agent, and interactions among
Lumer’s model have considerable amount of robots are achieved through a communication
random idle moves before they pick up or put network such as a wireless LAN. Since the cir-
down objects, and considerable amount of rep- cumstances around the robot change as the robots
etitions occur during the random idle moves. In move, the condition of each connection among the
Chen’s ASM model, an ant has two states: active various robots also changes. In this environment,
state and sleeping state. When the artificial ant when some of the connections in the network are
locates a comfortable and secure position, it has disabled, the system may not be able to maintain
a higher probability in sleeping state. Based on consistency among the states of the robots. Such a
ASM, Chen has proposed an Adaptive Artificial problem has a tendency to increase as the number
Ants Clustering Algorithm that achieves better of interactions increases.
clustering quality with less computational cost. In order to lessen the problems of excessive
Algorithms inspired by behaviors of social communication, mobile agent methodologies have
insects such as ants that communicate with each been developed for distributed environments. In
other by stigmergy are becoming popular (Dorigo the mobile agent system, each agent can actively
& Gambardella, 1996) and widely used in solving migrate from one site to another site. Since a mo-
complex problems (Toyoda & Yano, 2004; Becker bile agent can bring the necessary functionalities
& Szczerbicka, 2005). Upon observing real ants’ with it and perform its tasks autonomously, it can
behaviors, Dorigo et al found that ants exchanged reduce the necessity for interaction with other
information by laying down a trail of a chemical sites. In the minimal case, a mobile agent requires
substance (pheromone) that is followed by other that the connection be established only when it
ants. They adopted this ant strategy, known as performs migration (Binder, Hulaas & Villazon,
ant colony optimization (ACO), to solve vari- 2001). Figure 1 shows a conceptual diagram of a
ous optimization problems such as the traveling mobile agent migration. This property is useful for
salesman problem (TSP) (Dorigo & Gambardella, controlling robots that have to work in a remote
1996). Our ACC algorithm employs pheromone, site with unreliable communication or intermittent
instead of using Euclidian distance to evaluate its communication. The concept of a mobile agent
performance. also creates the possibility that new functions and
knowledge can be introduced to the entire multi-
agent system from a host or controller outside
ThE MOBILE AGENTS the system via a single accessible member of the
intelligent multi-robot system (Kambayashi &
Robot systems have made rapid progress in not Takimoto, 2005).
only their behaviors but also in the way they are Our system model consists of robots and a few
controlled (Murphy, 2000). Multi-agent systems kinds of static and mobile software agents. All
introduced modularity, reconfigurability and the controls for the mobile robots as well as ACC
extensibility to control systems, which had been computation performed in the host computer are
traditionally monolithic. It has made easier the achieved through the static and mobile agents.
development of control systems in distributed en- They are: 1) user interface agent (UIA), 2) op-
vironments such as intelligent multi-robot systems. eration agents (OA), 3) position collecting agent
177
Figure 1. A mobile agent is migrating from site A to site B
Figure 2. Cooperative agents to control a mobile robot
(PCA), 4) clustering simulation agent (CSA), and entire agent system. When the user creates
5) driving agents (DA). All the software agents this agent with a list of IP addresses of the
except UIA and CSA are mobile agents. A mobile mobile robots, UIA creates PCA and passes
agent (PCA) traverses robots scattered in the field the list to it.
to collect their coordinates. After receiving the 2. Operation Agent (OA): Each robot has at
assembly positions computed by a static agent least one operation agent (OA). It has the
(CSA), many mobile agents (DAs) migrate to the task that the robot on which it resides is
robots and drive them to the assembly positions. supposed to perform. Each mobile robot has
Figure 2 shows the interactions of the cooperative its own OA. Currently all operation agents
agents to control a mobile robot. (OA) have a function for collision avoidance
The functionality of each agent is described and a function to sense RFID tags, which
as follows: are embedded in the floor carpet, to detect
its precise coordinates in the field.
1. User Interface Agent (UIA): The user in- 3. Position Collecting Agent (PCA): A distinct
terface agent (UIA) is a static agent that agent called the position collecting agent
resides on the host computer and interacts (PCA) traverses mobile robots scattered in
with the user. It is expected to coordinate the the field and to collect their coordinates.
178
PCA is created and dispatched by UIA. small range so that the position-collecting agent
Upon returning to the host computer, it hands can obtain fairly precise coordinates from the tag.
the collected coordinates to the clustering The robots also have a basic collision avoidance
simulation agent (CSA) for ACC. mechanism using infrared sensors.
4. Clustering Simulation Agent (CSA): The CSA is the other static agent whose sole role
host computer houses the static clustering is ACC computation. When CSA receives the
simulation agent (CSA). This agent actually coordinate data of all the robots, it translates them
performs the ACC algorithm by using the into coordinates for simulation, and performs the
coordinates collected by PCA as the initial clustering. When CSA finishes the computation
positions, and produces the quasi-optimal and produces a set of assembly positions, it then
assembly positions of the mobile robots. creates the set of procedures for autonomous
Upon terminating the computation, CSA robot movements.
creates a number of driving agents (DA). CSA creates DA that convey the set of pro-
5. Driving Agent (DA): The quasi-optimal cedures to the mobile robots. Each DA receives
arrangement coordinates produced by the its destination IP address from PCA, and the set
CSA are delivered by driving agents (DA). of procedures for the destination robot, and then
One driving agent is created for each mobile migrates to the destination robot. Each DA has a
robot, and it contains the set of procedures for set of driving procedures that drives its assigned
the mobile robot. The DA drives its mobile robot to the destination, while it avoids collision.
robot to the designated assembly position. OA has the basic collision detection and avoidance
procedures, and DA has task-specific collision
OA detects the current coordinates of the robot avoidance guidance, such as the coordinates of
on which it resides. Each robot has its own IP ad- pillars and how to avoid them.
dress and UIA hands in the list of the IP addresses We have implemented the prototype of the
to PCA. First, PCA migrates to an arbitrary robot multi-agent system for mobile robot control using
and starts hopping between them one by one. It Agent Space (Satoh, 1999). Agent Space is a li-
communicates locally with OA, and writes the brary for constructing mobile agents developed by
coordinates of the robot into its own local data area. Satoh. By using its library, the user can implement
When PCA gets all the coordinates of the robots, a mobile agent environment with Java language.
it returns to the host computer. UIA waits certain In Agent Space, mobile agents are defined
period for PCA’s return. If PCA does not hear from as collections of call-back methods, and we
PCA for certain period, it declares “time-out” and implement the contents of the methods with the
cancels the PCA. Then UIA re-generates a new interfaces defined in the system. In order to create
PCA with new identification number. On the other a mobile agent, the application calls the create
hand, if PCA can not find a robot with one of the method. An agent migrates to another site by us-
IP addresses on the list, it retries certain times and ing the move and leave methods. When an agent
then declares the missing robot to be “lost.” PCA arrives, the arrive method is invoked. Migration
reports that fact to UIA. Upon returning to the by an agent is achieved by its duplication of itself
host computer, PCA creates CSA and hands in the at the destination site. Thus the move and leave
coordinate data to CSA which computes the ACC methods are used as a pair of methods for actual
algorithm. We employ RFID (Radio Frequency migration. Figure 3 shows the move method for
Identification) tagging to get precise coordinates. example. The other methods are implemented
We set RFID tags in a regular grid shape under similarly. The users are expected to implement a
the floor carpet tiles. The tags we chose have a destructor to erase the original agent in the leave
179
Figure 3. The move method
Figure 4. The position collecting agent (PCA)
Figure 5. The destination agent (DA)
method. Agent Space also provides services in its 4. The agent actually migrates to the specified
Application Program interface (API) such as the mobile robot.
move method to migrate agents and the invoke 5. Invoke arrive method in the destination
method to communicate to another agent. Figures robot, and the PCA communicates locally
4 and 5 show how PCA and DA work, respectively. to the OA in order to receive the coordinate
The following is the PCA implementation: of the robot.
6. Checks the next entry of the IP address list;
1. UIA invokes create method to create the if PCA visits all the mobile robots, it returns
mobile agent PCA, and hands in the list of to the host computer, otherwise migrates to
the IP addresses of mobile robots to PCA. the next mobile robot with the IP address of
2. PCA invokes move method so that it can the next entry in the list.
migrate to the mobile robot specified in the
top of the list of IP addresses. The followings are the DA implementation:
3. Invoke leave method.
180
1. CSA creates the mobile agents DA as many small clusters of objects. When a few clusters are
as the number of the mobile robots in the generated, those clusters tend to grow.
field. Since the purpose of traditional ACC is clus-
2. Each DA receives the IP address to where tering or grouping objects into several different
the DA is supposed to migrate, and the set classes based on selected properties, it is desirable
of the procedures to drive the robot. that the generated chunks of clusters grow into
3. The agents actually migrate to the specified one big cluster such that each group has distinct
mobile robots. characteristics. In our system, however, we want
4. Each DA invokes arrive method in the des- to produce several roughly clustered groups of
tination robot, constructs the sequence of the same type, and to minimize the movement
commands from the given procedures, and of each robot. (We assume we have one kind of
then communicates with the robot control cart robots, and we do not want robots move long
software called RCC (Robot Control Center) distances.) Therefore our artificial ants have the
in the notebook computer on the robot in following behavioral rules.
order to actually drive the mobile robot.
1. An artificial ant’s basic behavior is random
walk. When it finds an isolated object, it
ANT COLONY CLUSTERING picks it up.
2. When the artificial ant finds a cluster with
In this section, we describe our ACC algorithm certain number of objects, it tends to avoid
to determine the quasi-optimal assembly posi- picking up an object from the cluster. This
tions for multiple robots. Coordination of an ant number can be updated later.
colony is achieved through indirect communica- 3. When the artificial ant with an object finds
tion through pheromones. In previously explored a cluster, it put down the object so that the
ACO systems, artificial ants leave pheromone object is adjacent to one of the objects in the
signals so that other artificial ants can trace the cluster.
same path (Deneuburg, Goss, Franks, Sendova- 4. If the artificial ant cannot find any cluster
Franks, Detrain, & Chretien, 1991). In our ACC with certain strength of pheromone, it just
system, however, we have attributed pheromone to continues a random walk.
objects so that more objects are clustered in a place
where strong pheromone is sensed. The simula- By the restrictions defined in the above rules,
tion agent, CSA, performs the ACC algorithm as the artificial ants tend not to convey objects for a
a simulation. The field of the simulation has the long distance, and produce many small heaps of
coordinates of objects and their pheromone values, objects at the first stage. In order to implement the
so that the artificial ants can obtain all the neces- first feature, the system locks objects with certain
sary information (coordinates and pheromone) number of adjoining objects, and no artificial ant
for the simulation field. can pick up such a locked object. The number
Randomly walking artificial ants have a high for locking will be updated later so that artificial
probability of picking up an object with weak ants can bring previously locked objects in order
pheromone, and putting the object where they to create larger clusters. When the initially scat-
sense strong pheromone. They are designed not tered objects are clustered into small number of
to walk long distances so that the artificial ants heaps, the number of objects that causes objects
tend to pick up scattered objects and produce many to be locked is updated, and the activities of the
artificial ants re-start to produce smaller number
181
Figure 6.The behavior of the artificial ant
of clusters. We describe the behaviors of the ar- have any object is supposed to move straight in
tificial ants below. a randomly determined initial direction for ten
In the implementation of our ACC algorithm, steps at which time the ant randomly changes the
when the artificial ants are generated, they have direction. The ant also performs side steps from
randomly supplied initial positions and walking time to time to create further randomness.
directions. An artificial ant performs a random When the artificial ant finds an object during
walk; when it finds an unlocked object, it picks up its random walk, it determines whether to pick it
the object and continues random walk. During its up or not based on whether the object is locked or
random walk, when it senses strong pheromone, it not, and the strength of pheromone the object has,
puts down the conveyed object. The artificial ants according to the value of the formula (1) below.
repeat this simple procedure until the termination An artificial ant will not pick any locked object.
condition is satisfied. Figure 6 shows the behavior Whether an object is locked or not is also deter-
of an artificial ant. We explain several specific mined by the formula (1). Here, p is the density
actions that each artificial ant performs below. of pheromone and k is a constant value. p itself
The base behavior for all the artificial ants is is a function of the number of objects and the
the random walk. The artificial ant that does not distance from the cluster of objects. The formula
182
Figure 7. Strengths of pheromones and their scope
simply says that the artificial ant does not pick When an artificial ant picks up an object, it
up an object with strong pheromone. changes its state into “pheromone walk.” In this
state, an artificial ant tends probabilistically move
(p + l ) ∗ k toward a place it where it senses the strongest
f (p ) = 1 − (1)
100 pheromone. The probability that the artificial ant
takes a certain direction is n/10, where n is the
Currently, we choose p as the number of ad- strength of the sensed pheromone in that direction.
jacent objects. Thus, when an object is com- Figure 7 shows the strengths of pheromones and
pletely surrounded by other objects, p is the their scope. This mechanism causes the artificial
maximum value, set in our experiments to be nine. ants move toward the nearest cluster, and conse-
We choose k to be equal to thirteen in order to quently minimizes the moving distance.
prevent any object surrounded by other eight An artificial ant carrying an object determines
objects from being picked up. Then, the com- whether to put down the object or to continue to
puted value of f (p) = 0 (never pick it up). l is a carry it. This decision is made based on the for-
mula (2). Thus, the more it senses strong phero-
constant value at which an object is locked. Usu-
mone, the more it tends to put the carried object.
ally l is zero (not locked). When the number of
Here, p and k are the same as in the formula (1).
clusters becomes less than two third of the num-
The formula simply says when the artificial ant
ber of total objects and p is greater than three, we
bumps into a locked object; it must put the carried
set l to six. When the number of clusters becomes
object next to the locked object. Then, the value
less than one third of the number of total objects
of f (p) = 1 (must put it down).
and p is greater than seven, l becomes three. When
the number of clusters becomes less than the
p ∗k
number of the user setting and p is greater than f (p ) = (2)
nine, l becomes one. Any objects that meet these 100
conditions are deemed to be locked. This “lock” Conventional ACC algorithms terminate
process prevents artificial ants from removing when all the objects are clustered in the field, or
objects from growing clusters, and contributes to predefined number of steps are executed (Chen,
stabilizing the clusters’ relatively monotonic Xu & Chen, 2004). In such conditions, however,
growth.
183
Figure 8. A team of mobile robots work under control of mobile agents
the clustering may be over before obtaining sat- control application, called RCC (robot control
isfactory clusters. Therefore we set the terminate center), resides on each host computer. Our mobile
condition of our ACC algorithm that the number agents communicate with RCC to receive sensor
of resulted clusters is less than ten, and all the data. Figure 8 shows a team of mobile multiple
clusters have three or more objects. This condition robots working under control of mobile agents.
may cause longer computation time than usual the In the previous implementation, an agent on
ACC, but preliminary experiments show that this the robot calculates the current coordinates from
produces reasonably good clustering. the initial position, and determines where com-
puted coordinates are different from actual posi-
tions (Ugajin, Sato, Tsujimura, Yamamoto, Taki-
ThE ROBOTS moto & Kambayashi, 2007). The current
implementation employs RFID (Radio Frequen-
In this section, we demonstrate that the model of cy Identification) to get precise coordinates. We
static and mobile agents with ant colony cluster- set RFID tags in a regular grid shape under the
ing (ACC) is suitable for intelligent multi-robot floor carpet tiles. The tags we chose have a small
systems. We have employed the ER1 Personal range so that the position-collecting agent can
Robot Platform Kit by Evolution Robotics Inc. as obtain fairly precise coordinates from the tag.
the platform for our prototype (Evolution Robotics, Figure 9 shows the RFID under a floor carpet tile.
2008). Each robot has two servomotors with tires. The robot itself has a basic collision avoidance
The power is supplied by a rechargeable battery. mechanism using infrared sensors.
It has a servomotor controller board that accepts For driving robots along a quasi-optimal route,
RS-232C serial data from a host computer. Each one needs not only the precise coordinates of each
robot holds one notebook computer as its host robot but also the direction each robot faces. In
computer. Our control mobile agents migrate to order to determine the direction that it is facing,
these host computers by wireless LAN. One robot each robot moves straight ahead in the direction
184
Figure 9. RFID under a carpet tile Figure 10. RFID tags in square formation
Figure 11. Determining orientation from two RFID tag positions
it is currently facing and obtains two positions ally a small degree). Once OA obtains the current
(coordinates) from RFID tags under the carpet position, it drives the robot a short distance until
tiles. Determining current orientation is important the RFID module detects the second RFID tag.
because there is a high cost for making a robot Upon obtaining two positions, a simple computa-
rotates through a large angle. It is desirable for tion determines the direction in which the robot
each robot be assigned rather simple forward is moving, as shown in Figure 11.
movements rather than complex movement with In a real situation, we may find the robot close
several direction-changes when there are obstacles to the wall and facing it. Then we cannot use the
to avoid. Therefore whenever OA is awake, it simple method just described above. The robot
performs the two actions, i.e. obtaining the current may move forward and collide with the wall. The
position and calculating the current direction. Robot has two infrared sensors, but their ranges
Since each carpet tile has nine RFID tags, as are very short and when they sense the wall, it is
shown in Figure 10, the robot is supposed to obtain often too late to compute and execute a response.
the current position as soon as OA gets the sensor In order to accommodate such situations, we make
data from the RFID module. If OA can not obtain RFID tags near the wall have a special signal to
the position data, it means the RFID module can the robot that tells it that it is at the end of the
not sense a RFID tag, OA makes the robot rotate field (near the wall) so that the robot that senses
until the RFID module senses a RFID tag (usu- the signal can rotate to the opposite direction, as
185
Figure 12. RFID tags near the wall emit special

represents the average distance of all the objects
signal
moved.
Eval = Ave + Clust (3)
The results of the experiments are shown in

Tables 1 through 3. The results of our ACC algo-
rithm are displayed in the column “Ant Colony
Clustering,” and the results of gathering objects
predefined assembly positions are displayed in the
column “Specified Position Clustering.” We have
implemented both algorithms in the simulator. In
the “Specified Position Clustering,” we set four
shown in Figure 12. This is only required in our assembly positions. For the experiments, we set
experimental implementation because the current three cases that are the number of objects 200,
collision detection mechanism does not otherwise 300 and 400. In every case, we set the number of
work well enough to avoid collision with the wall. artificial ants to be 100. We have performed five
Therefore we employ the special signal in the trials for each setting, and obtained the evaluation
RFID tags. When the robot finds a wall while values calculated by the formula (3), as shown in
moving to obtain its location, it arbitrarily chang- Tables 1 through 3. In all three cases, the compu-
es direction, and starts obtaining two positions tations are over less than six seconds under the
again (Kambayashi, Sato, Harada, & Takimoto, computing environment of the experiments is
2009). JVM on Pentium IV 2.8 GHz and Windows XP.
Since the physical movements of robots take much
more time than computation, we can ignore the
NUMERICAL ExPERIMENTS computation complexity of the ACC algorithm.
In every case, our ACC algorithm produced
We have conducted several numerical experiments a better result than that for predefined assembly
in order to demonstrate the effectiveness of our positions. We can observe that as the number of
ACC algorithm using a simulator. The simulator objects increases, the average moving distance
has an airport-like field with a sixty times sixty of each object decreases. We can, however, ob-
grid cut into two right equilateral triangles with serve that increasing the number of ants does not
fifteen unit sides at the top left and right, and one contribute the quality of clustering as shown in
trapezoid with top side thirty, bottom side fifty- Table 4 and 5.
eight and height fifteen at the bottom as shown In the first implementation, we did not employ
Figure 13 (a) through (c). We have compared the a “lock” process. Under those conditions we
results of our ACC algorithm with an algorithm observed no convergence. The number of clusters
that performs grouping objects at predefined as- was roughly 300. When we add the “lock” process,
sembly positions. The desirable arrangements are clusters start to emerge, but they tend to stabilize
those of low moving costs and fewer numbers of at about thirty clusters. Then the artificial ants see
clusters. Therefore, we defined comparing fitness only immediately adjacent places. When we add
value that is represented in the formula (3). Here, the feature to artificial ants that they can dy-
Clust represents the number of clusters, and Ave namically change their scope in the field, the small
186
Figure 13. Clustered objects constructed by assembling at positions computed by ACC and at predefined
positions
Table 1. Airport field (Objects: 200, Ant Agent: 100)
Ant Colony Clustering Specified Position Clustering

Airport
Field Cost Ave Clust Eval Cost Ave Clust Eval
1 2279 11.39 4 15.39 5826 29.13 4 33.13
2 2222 11.11 3 14.11 6019 30.09 4 34.09
3 2602 13.01 2 15.01 6194 30.97 4 34.97
4 2433 12.16 3 15.16 6335 31.67 4 35.67
5 2589 12.94 4 16.94 6077 30.38 4 34.38
Table 2. Airport field (Objects: 300, Ant Agent: 100)

Airport
1 2907 9.69 4 13.69 8856 29.52 4 33.52
2 3513 11.71 4 15.71 9142 30.47 4 34.47
3 3291 10.97 4 14.97 8839 29.46 4 33.46
4 3494 11.64 3 14.64 8867 29.55 4 33.55
5 2299 7.66 6 13.66 9034 30.11 4 34.11
Table 3. Airport Field (Objects: 400, Ant Agent: 100)

Airport
1 4822 12.05 1 13.05 11999 29.99 4 33.99
2 3173 7.93 6 13.93 12069 30.17 4 34.17
3 3648 9.12 4 13.12 12299 30.74 4 34.74
4 3803 9.51 3 12.51 12288 30.72 4 34.72
5 4330 10.82 5 15.82 12125 30.31 4 34.31
187

Airport
1 3270 10.9 5 15.9 9156 30.52 4 34.52
2 2754 9.18 5 14.18 9058 30.19 4 34.19
3 3110 10.36 4 14.36 9006 30.02 4 34.02
4 3338 11.12 3 14.12 9131 30.43 4 34.43
5 2772 9.24 5 14.24 8880 29.6 4 33.6

Airport
1 3148 10.49 4 14.49 8887 29.62 4 33.62
2 3728 12.42 3 15.42 8940 29.8 4 33.8
3 2936 9.78 5 14.78 8923 29.73 4 33.73
4 3193 10.64 3 13.64 9408 31.36 4 35.36
5 4131 13.77 2 15.77 9309 31.03 4 35.03
clusters converge into a few big clusters. Then agents carrying the requisite set of procedures
the artificial ants can see ten to twenty positions migrate to the mobile robots, and so direct the
ahead. As a result of our experiments, we realize robots using the sequence of the robot control
that we need to continue to improve our ACC commands constructed from the given set of
algorithm. procedures.
Since our control system is composed of sev-
eral small static and mobile agents, it shows an
CONCLUSION AND excellent scalability. When the number of mobile
FUTURE DIRECTIONS robots increases, we can increase the number of
mobile software agents to direct the mobile ro-
We have presented a framework for controlling bots. The user can enhance the control software
mobile multiple robots connected by communica- by introducing new features as mobile agents
tion networks. Mobile and static agents collect the so that the multi-robot system can be extended
coordinates of scattered mobile multiple robots dynamically while the robots are working. Also
and implement the ant colony clustering (ACC) mobile agents decrease the amount of the necessary
algorithm in order to find quasi-optimal positions communication. They make mobile multi-robot
to assemble the mobile multiple robots. Making applications possible in remote site with unreli-
mobile multiple robots perform the ant colony able communication. In unreliable communication
optimization is enormously inefficient. Therefore environments, the multi-robot system may not be
a static agent performs the ACC algorithm in its able to maintain consistency among the states of
simulator and computes the quasi-optimal posi- the robots in a centrally controlled manner. Since
tions for the mobile robots. Then other mobile a mobile agent can bring the necessary functional-
188
ities with it and perform its tasks autonomously, it any one step of each ant’s behavior is simple, we
can reduce the necessity for interaction with other can assume it takes constant execution time. Even
sites. In the minimal case, a mobile agent requires though apparently obvious, we need to confirm
that the connection be established only when it this with quantitative experiments.
performs migration (Binder, Hulaas & Villazon, One big problem is to determine how we
2001). The concept of a mobile agent also creates should include the collision avoidance behaviors
the possibility that new functions and knowledge of robots into the simulation. We need to quantify
can be introduced to the entire multi-agent system real robot movements more completely. Col-
from a host or controller outside the system via a lision avoidance itself is a significant problem
single accessible member of the intelligent multi- because the action of clustering means create a
robot system (Kambayashi & Takimoto, 2005). jam of moving robots. Each driving agent must
While our imaginary application is simple cart maneuver its robot precisely to the destination
collection, the system should have a wide variety while avoiding colleague robots as it dynamically
of applications. determines its destination coordinates. This task
We have implemented a team of mobile robots requires much more intelligence we had expected
to show the feasibility of our model. In the cur- early in this project.
rent implementation, an agent on the robot can During the experiments, we experienced the
obtain fairly precise coordinates of the robots following unfavorable situations:
from RFID tags.
The ACC algorithm we have proposed is 1. Certain initial arrangements of objects causes
designed to minimize the total distance objects very long periods for clustering,
are moved. We have analyzed and demonstrated 2. If one large cluster is created at the early
the effectiveness of our ACC algorithm through stage of the clustering and the rest of the
simulation, performing several numerical ex- field has scarce objects, then all the objects
periments with various settings. Although we are assembled into one large cluster. This
have so far observed favorable results from the situation subsequently makes aggregate
experiments in the simulator, applying the results moving distance long, and
of the simulation to a real multi-robot system is 3. As a very rare case, the simulation does not
difficult. Analyzing the results of the simulation, converge.
we often find the sum of the moving distances
of all the robots is not minimal as we expected. Even though such cases are rare, these phe-
We have re-implemented the ACC algorithm to nomena suggest further avenues for research
use only the sum of moving distances and have for our ACC algorithm. As we mentioned in the
found some improvement. Even though we believe previous section, we need to design the artificial
that the multi-agent framework for controlling ants have certain complex features that changes
multi-robot systems is a right direction, we have their ability to adapt to circumstances. We defer
to overcome several problems before constructing this investigation to our future work.
a practical working system. On the other hand, when certain number of
Compared with the time for robot movements, clusters have emerged and stabilize, we can
the computation time for the ACC algorithm is coerce them into several (three or four) clusters
negligible. Even if the number of artificial ants by calculating the optimal assembly points. This
increases, the computation time will increase coercion to required assembly points should be
linearly, and the number of objects should not one of the other directions for our future work.
influence the computation’s complexity. Because We may also investigate computing the number
189
of clusters and their rough positions prior to per- Dorigo, M., & Gambardella, L. M. (1996). Ant
forming the ACC algorithm, so that we can save Colony System: a Cooperative Learning Approach
much computation time. In many ways, we have to the Traveling Salesman . IEEE Transactions
room to improve our assembly point calculation on Evolutionary Computation, 1(1), 53–66.
method before integrating everything into one doi:10.1109/4235.585892
working multi-robot system.
Evolution Robotics Ltd. Homepage (2008). Re-
trieved from http://www.evolution.com/
ACKNOWLEDGMENT Kambayashi, Y., Sato, O., Harada, Y., & Taki-
moto, M. (2009). Design of an Intelligent Cart
We appreciate Kimiko Gosney who gave us useful System for Common Airports. In Proceedings
comments. This work is partially supported by Ja- of 13th International Symposium on Consumer
pan Society for Promotion of Science (JSPS), with Electronics. CD-ROM.
the basic research program (C) (No. 20510141),
Kambayashi, Y., & Takimoto, M. (2005). Higher-
Grant-in-Aid for Scientific Research.
Order Mobile Agents for Controlling Intelligent
Robots. International Journal of Intelligent In-
formation Technologies, 1(2), 28–42.
REFERENCES
Kambayashi, Y., Tsujimura, Y., Yamachi, H., Ta-
Becker, M., & Szczerbicka, H. (2005). Parameters kimoto, M., & Yamamoto, H. (2009). Design of
Influencing the Performance of Ant Algorithm Ap- a Multi-Robot System Using Mobile Agents with
plied to Optimisation of Buffer Size in Manufac- Ant Colony Clustering. In Proceedings of Hawaii
turing. Industrial Engineering and Management International Conference on System Sciences.
Systems, 4(2), 184–191. IEEE Computer Society. CD-ROM
Binder, W. J., Hulaas, G., & Villazon, A. (2001). Lumer, E. D., & Faieta, B. (1994). Diversity and
Portable Resource Control in the J-SEAL2 Mobile Adaptation in Populations of Clustering Ants. In
Agent System. In Proceedings of International From Animals to Animats 3: Proceedings of the
Conference on Autonomous Agents (pp. 222-223). 3rd International Conference on the Simulation
Chen, L., Xu, X., & Chen, Y. (2004). An adaptive of Adaptive Behavior (pp. 501-508). Cambridge:
ant colony clustering algorithm. In Proceedings of MIT Press.
the Third IEEE International Conference on Ma- Murphy, R. R. (2000). Introduction to AI robotics.
chine Learning and Cybernetics (pp. 1387-1392). Cambridge: MIT Press.
Deneuburg, J., Goss, S., Franks, N., Sendova- Nagata, T., Takimoto, M., & Kambayashi, Y.
Franks, A., Detrain, C., & Chretien, L. (1991). (2009). Suppressing the Total Costs of Executing
The Dynamics of Collective Sorting: Robot-Like Tasks Using Mobile Agents. In Proceedings of the
Ant and Ant-Like Robot. In Proceedings of First 42nd Hawaii International Conference on System
Conference on Simulation of Adaptive Behavior: Sciences, IEEE Computer Society. CD-ROM.
From Animals to Animats (pp. 356-363). Cam-
bridge: MIT Press.
190
Sato, O., Ugajin, M., Tsujimura, Y., Yamamoto, H., KEY TERMS AND DEFINITIONS
& Kambayashi, Y. (2007). Analysis of the Behav-
iors of Multi-Robots that Implement Ant Colony Mobile robot: In contrast to an industrial
Clustering Using Mobile Agents. In Proceedings robot, which usually consist of a multi-linked
of the Eighth Asia Pacific Industrial Engineering manipulator and an end effecter that is attached
and Management System. CD-ROM. to a fixed surface, a mobile robot has the capabil-
ity to move around in its environment. “Mobile
Satoh, I. (1999). A Mobile Agent-Based Frame- robots” often implies autonomy. Autonomous
work for Active Networks. In Proceedings of robots can perform desired tasks in unstructured
IEEE Systems, Man, and Cybernetics Conference environments with minimal user intervention.
(pp. 161-168). Mobile agent: A piece of program that can
Takimoto, M., Mizuno, M., Kurio, M., & Kam- migrate from a computational site to another
bayashi, Y. (2007). Saving Energy Consumption computational site while it is under execution.
of Multi-Robots Using Higher-Order Mobile Multi-robots: A set of mobile robots. They
Agents. In Proceedings of the First KES Inter- are relatively small and expected to achieve given
national Symposium on Agent and Multi-Agent tasks by cooperating with each other.
Systems: Technologies and Applications (LNAI Intelligent robot control: A method to control
4496, pp. 549-558). mobile robots. This method allows mobile robots
to behave autonomously reducing user interven-
Toyoda, Y., & Yano, F. (2004). Optimizing Move- tions to a minimum.
ment of a Multi-Joint Robot Arm with Existence Swarm intelligence: the property of a system
of Obstracles Using Multi-Purpose Genetic Algo- whereby the collective behaviors of (mobile)
rithm. Industrial Engineering and Management agents interacting locally with their environ-
Systems, 3(1), 78–84. ment cause coherent functional global patterns
Ugajin, M., Sato, O., Tsujimura, Y., Yamamoto, to emerge. A swarm has been defined as a set of
H., Takimoto, M., & Kambayashi, Y. (2007). In- (mobile) agents which are liable to communicate
tegrating Ant Colony Clustering Method to Multi- directly or indirectly with each other, and which
Robots Using Mobile Agents. In Proceedings of collectively carry out a distributed problem solv-
the Eigth Asia Pacific Industrial Engineering and ing.
Management System. CD-ROM. Ant colony optimization: A probabilistic tech-
nique inspired by the behaviors of social insects
Wang, T., & Zhang, H. (2004). Collective Sorting “ants.” It was proposed as a method for solving
with Multi-Robot. In Proceedings of the First hard combinatorial optimization problems, and
IEEE International Conference on Robotics and is known to be useful for solving computational
Biomimetics (pp. 716-720). problems which can be reduced to finding good
paths through graphs.
Clustering algorithm: An algorithm that
extracts similar data items from unstructured data
and group them into several clusters.
191
Section 5
Multi-Agent Games and
Simulations
193
Chapter 11
The AGILE Design of
Reality Game AI
Robert G. Reynolds
Wayne State University, USA
John O’Shea
University of Michigan-Ann Arbor, USA
Xiangdong Che
Yousof Gawasmeh
Guy Meadows
University of Michigan-Ann Arbor, USA
Farshad Fotouhi
ABSTRACT
This chapter investigates the use of agile program design techniques within an online game develop-
ment laboratory setting. The proposed game concerns the prediction of early Paleo-Indian hunting sites
in ancient North America along a now submerged land bridge that extended between Canada and the
United States across what is now Lake Huron. While the survey of the submerged land bridge was being
conducted, the online class was developing a computer game that would allow scientists to predict where
sites might be located on the landscape. Crucial to this was the ability to add in gradually different levels
of cognitive and decision-making capabilities for the agents. We argue that the online component of the
courses was critical to supporting an agile approach here. The results of the study indeed provided a
fusion of both survey and strategic information that suggest that movement of caribou was asymmetric
over the landscape. Therefore, the actual positioning of human artifacts such as hunting blinds was
designed to exploit caribou migration in the fall, as is observed today.
DOI: 10.4018/978-1-60566-898-7.ch011
The AGILE Design of Reality Game AI
INTRODUCTION regions being of higher elevation. The target of

the project is a stretch of land labeled the Alpena-
Agile software design methodologies are a re- Amberly Ridge. This runs from the Alpena region
sponse to traditional plan-based approaches such of Michigan on the west to the Godderich region
as the waterfall model and others. There are a of Canada on the east.
number of different agile methodologies. These The goal of the project was to perform a sonar-
support short term increments in the development based underwater survey of the area in order to
of a software project. These increments reflect see if there was evidence for human modification
short rather than long-term planning decisions. of the landscape. This was conducted through the
Iterations are done in short time frames known University of Michigan-Ann Arbor by Dr. John
as “time boxes”. These boxes range from 1 to 4 O’Shea. More detailed underwater surveys would
weeks in duration. Within each time box aspects be predicted on acquired knowledge of where
of a full software development life cycle can specifically to look. These devices can be manned
take place which include planning, requirements or robotic but by their nature required a small area
analysis, design, coding, unit and acceptance to search. With this in mind we developed a soft-
testing. At the end of each time box, there is an ware game as part of the Computer Science Game
available release which may still not contain all Programming class at Wayne State University,
of the intended functionality. CSC 5430. The class had a lecture and a labora-
Work is done by small groups of 5 to 9 indi- tory and was both in-class and on-line. The goal
viduals whose composition is cross functional of the game was to simulate the movement of
and self-organizing without consideration of animal and human hunting and foraging agents
organizational structure or hierarchy. The goal is over the land bridge. In particular, we focused on
to encourage face to face communication in prefer- spring and fall where it was expected that flocks
ence to written communication. Associated with of Caribou would use the bridge as part of their
a team is a customer representative who makes a annual migrations. It was felt that such movements
personal commitment to being available for de- would attract hunters who would restrict aspects
velopment questions. The production of working of the bridge in order facilitate the hunting of
software is the prime indicator for progress. Due Caribou.
to the emphasis on face to face communication the Therefore, the amount of social intelligence
number of written documents produced is often allocated to the caribou and the human hunters
less than other methods, although documents are in the game will affect their movement and the
viewed to rank equally with the working product number of caribou taken. In the game, the objec-
and other produced artifacts. tive was to place hunters, their encampments,
Recently an agile approach has been applied their hunting stands, and purposefully placed
to the development of a game to support research obstacles such as fence-like rock configurations
in locating Paleo-Indian occupation sites and in order to maximize the count of caribou taken.
hunting camps along a land bridge that spanned Those configurations of components that produced
Lake Huron from what is now Michigan to what an above average yield would suggest to the un-
is now Ontario, Canada. This project received a derwater team where to most profitably perform
2008 NSF High Risk grant in order to generate further search.
evidence for possible occupation of the ancient Since underwater surveys are conducted in
land bridge which is now under hundreds of feet late August and September at the same time as
of water. Figure 1 below provides the bathymetry, the onset of the gaming class it was felt, that each
or deep profile of Lake Huron, with the lighter avenue of research can provide data about human
194
occupation, both archaeologically and via the game Then, in section 4 the basic steps in the layering of
play. It was of interest to see how the two sources the social intelligence using Cultural Algorithms
related to each other at the end of the term. The into the system through laboratory and lecture as-
course was set up to allow the gradual layering of signments is given. Section 5 provides an example
intelligence into each of the major agents, caribou of how the project developed within this context.
and humans through the support of an explicitly Section 6 concludes the chapter by assessing how
agile methodology. In fact, we felt that the on-line the project results correspond to the survey results
aspect of the course would play an important role and describes future work.
in the application of the methodology since it will
facilitate student communication and allow the
archaeologists to have access to class discussions, ThE LAKE STANLEY LAND BRIDGE
demonstrations, and to provide feedback. Given
the physical separation of the data collection and Overview
the software development sites, a face to face ap-
proach was not possible. In this chapter, we use It is difficult, if not impossible, to consider the
on-line technology to substitute for this face to character of the Paleo-Indian and Early Archaic
face interaction required with the Agile Develop- occupation of the Great Lakes region without
ment methodology. Our goal will be to see if this reference to the grudging withdrawal of the
approach can still support the agile development continental ice sheet, and the subsequent rises
of a real-world game for this application. and drops in the waters of the Great Lakes that
In section 2 the Land Bridge project is briefly accompanied the region’s gradual transition to its
described. Section 3 describes how the 12 basic modern appearance. Archaeologists have used the
tenets of agile programming, its manifesto, is sequence of high water beaches to date these early
supported within the on-line course framework. sites, although with few exceptions the sites have
Figure 1. Bathymetry of Lake Huron. The land bridge is bright yellow and labeled the Alpena Amberly
Ridge on the map.
195
been thoroughly disturbed by later land use, and The Survey and Data
their faunal assemblages have been dissolved by Collection Methodology
the acid forest soils. For sites associated with the
periods of low water, there are often no surface The post-glacial history of the Great Lakes is
sites to be found at all. Some can be found deeply characterized by a series of high and low water
buried beneath later lake sediments, and many stands produced by the interaction of early Ho-
more are presumed lost forever, somewhere out locene climate, the flows of glacial melt waters
under the lakes. and the isostatic rebound of recently deglaciated
Newly released high resolution bathometry land surfaces (Lewis, et al., 1994; Moore, Rea,
of the Great Lakes, coupled with advances in Mayer, Lewis, & Dobson, 1994; Larsen, 1999) as
3-D surface modeling, make it possible to once shown in Figure 2 The most extreme of the low
again view the ancient landforms from these low water stands in the Lake Huron basin is referred
water periods. This in turn raises the possibility to as the Lake Stanley stage (Lake Chippewa in
of discovering the early human settlements and the Lake Michigan basin), which spanned roughly
activity sites that existed in this environment. 10,000 to 7500BP and was associated with lake
In this project, we seek to explore this potential levels as low as 80-90m amsl (compared to the
with respect to the earliest of the major low water modern standard of 176 masl (meters above sea
stands in the Lake Huron basin, Lake Stanley, and level)) (Lewis et al., 2007).
the unique causeway structure that once linked When projected at these levels the Lake Huron
Michigan with Ontario. This causeway or “land basin contains two lakes separated by a ridge or
bridge” would have been available for coloniza- causeway extending northwest to southeast across
tion by plant, animal, and human communities. the basin from the area of Presque Isle, Michigan
It would have also been a vehicle to support the to Point Clark in Ontario. The causeway, termed
movement of animals, specifically caribou, across the Alpena – Amberley Ridge, averages 16 km in
Lake Huron during spring and fall migrations. width (Figure 1) and is capped with glacial till
and Middle Devonian limestone and dolomite
(Hough, 1958; Thomas, Kemp, & Lewis, 1973).
It is represented via a topographic map where the
Figure 2. The history of the ancient Great lakes. As the glaciers retreated a precursor to Lake Huron
was formed, Lake Stanley around 9000 B.B.
196
three targeted survey regions are highlighted by in this chapter was collected by a surface-towed
boxes. Region 3 in the middle of the three was side scan sonar and remote operated vehicles
the target for the first surveys here. (ROVs). An example is shown in figure 3 below.
The earliest human occupation in the upper The side scan survey was conducted using a digital
Great Lakes is associated with a regional fluted side scan sonar unit (Imagenex) at a frequency of
point Paleo-Indian tradition which conventionally 330 kHz and a depth of 30m, mapping overlapping
ends at the start of the Lake Stanley low water swaths of roughly 200m. Targets of interest, identi-
stage (Ellis, Kenyon, & Spence, 1990; Monaghan fied from acoustic survey, were examined using
& Lovis, 2005; Shott, 1999). The terminal Paleo- a remote operated vehicle (ROV). The current
Indian and Early and Middle Archaic populations work utilized two mini-ROVs, a SeaBotix LBV
that inhabited the region during Lake Stanley 150, and an Outland 1000, which can be manu-
times would have experienced an environment ally deployed from a small craft. Two pilot search
that was colder and drier than present with a areas have been covered, representing a total area
spruce dominated forest (Croley & Lewis, 2006; of 72 sq km, at depths ranging from 12 to 150m.
Warner, Hebda, & Hahn, 1984). Sites associated Based upon the results of the current survey
with these time periods are rare. While some are and the corresponding reality game we hope to
found preserved beneath meters of later lake sedi- acquire sufficient information to motivate a more
ment (Lovis, 1989), it is generally assumed that detailed search using autonomous underwater
most were lost as Lake Huron rose to its modern vehicles (AUVs) and direct observation by ar-
levels. Here we report on the first evidence for chaeologists using SCUBA. The next section will
human activity on the Alpena-Amberley Land provide an overview of the software development
Bridge; a structure that during the Lake Stanley methodology that we used to develop the research
low water phase would have provided a land con- game here.
nection across the middle of modern Lake Huron
linking northern Michigan with central Ontario.
Archaeologists have long recognized the ThE AGILE METhODOLOGY
potential for discovering sites of Pleistocene
and early Holocene age in coastal areas that ex- Why Agile?
perienced repeated exposure and submergence,
although these efforts have typically focused on Boehm and Turner (2004) suggest that the “home
marine environments that were subject to global ground” for the use of Agile Program Design
changes in sea level. During the past year, inves- methodologies can be described using in terms
tigators from the Museum of Anthropology and of the following factors. First, the system to be
the Marine Hydrodynamics Laboratory at the developed has low criticality relative to existing
University of Michigan have begun the task of systems. Secondly, it involves the use of senior
testing whether human occupation sites are pres- developers. The third is that the requirements
ent on the Alpena-Amberley Ridge beneath Lake change often. Fourthly the numbers of developers
Huron. A particularly tantalizing possibility is the is small, 5 to 10. Lastly, the target project domain
potential that stone constructions, such as caribou is one that thrives on “chaos”. That is, it is a project
drive lanes, hunting blinds and habitation sites, of of high complexity.
a kind only preserved in subarctic regions, might All of the factors are true in this case. The
be visible on the lake bottom. system has a low criticality since there is no exist-
To discover sites within this setting, a multilay- ing system that depends directly on its production.
ered search strategy was developed. The data used Also, the survey is currently independent of the
197
Figure 3. Examples of remote operated vehicles
results. Thus, results of the software are not as the expert user. Developers communicate indi-
needed at this point to determine survey agendas. rectly with him via the video-streaming device.
Since this study is in a sense one of a kind, there Lectures and project discussions are recorded and
is very little a priori knowledge about what to the web links are given to all parties for reference.
expect from the survey, as well as what the agents Reynolds was the course instructor, and with a
based game should generate. research position in the Museum of Anthropol-
As for the fourth point, the develop group is ogy he was able to facilitate a dialogue with the
small, consisting of senior undergraduates and expert and the student developers.
graduate students. In addition, O’Shea functioned
Figure 4. A topographic map of the Lake Stanley Causeway extending from the mitten portion of Michigan
on the left to Ontario on the right. The three target survey regions along the causeway are highlighted.
198
Since this is a research project that by defi- was on average one week. Each new assignment
nition is a “high risk” endeavor there is much added complexity to the previous assignment in a
uncertainty about what can possibly be extracted gradual fashion. How the layering of complexity
from these surveys, and inferred from the game was performed here is described in section 4. The
program activities. So things tend to change from course instructor evaluated the submissions and
day to as the survey commenced. This was in fact passed the evaluated work and questions along to
an exciting thing for the students to participate the expert through twice weekly videos and one
in. Between the lecture and lab there were 14 weekly face to face meeting.
assignments, about one a week which meant a
constant work load for the students. However, Frequently Delivered Software
all 10 students, both online and in class stayed on
to complete the project. This is quite an unusual The goal was to have a working program each
occurrence since during the course of the term week that contained a new feature in addition
there are many events that occur that are beyond to features from previous ones. The online
the control of the student, yet they all delivered component allowed the instructor to critique
the final project. submissions in terms of the requirements. This
resulted in students “tracking” these issues and
Supporting the Agile Approach incorporating the desired features into subsequent
with Online Technology programs. This resulted in undesirable features
being eliminated quickly throughout the group.
Wood and Kleb demonstrated that agile meth- While each programmer did their own program
odologies, specifically Xtreme Programming, they were able to share utility modules and graph-
developed primarily to support a business ics, which encouraged desirable features to spread.
customer-developer relationship can be extended One student, for example came up with a sound
to research based projects (Wood & Kleb, 2002). package that he distributed.
Here we extend the application of agile method-
ologies to developing research based programs Working Software is the Principle
using an on-line course environment within an Measure of Progress
academic setting. We feel that the addition of an
online component to a game development course The language for the class was Python, a text based
enhances the ability of the course to support an scripting language. The emphasis was on making
agile development methodology. the code as much as possible self documenting.
The best way to demonstrate how this enhances While some documentation was required such as
the application of agile methods is to describe how object-oriented diagrams etc. emphasis during the
each of the basic features of the “Agile manifesto evaluations done in class was on performance,
(Ambler, 2008) are supported in the classroom code quality, and readability. This made delivering
here. The features and their implementation are a new program a week feasible for a student, given
as follows: the reduced emphasis on additional documenta-
tion. The online component allowed the instructor
Rapid Continuous Delivery to execute the submissions and provide feedback to
of Useful Software students during the lecture as part of the critique.
In our case this meant that there would be a lecture

or lab assignment every week over the 14 weeks
of the course. The duration of each assignment
199
Changes in Project Requirements are Project is Built Around a

Welcomed at Any Time, Even Late Motivated Individual
The video streaming component of the course Professor O’Shea is an expert in Great Lakes un-
supported the change of requirements. New derwater archaeology and an enthusiastic advocate
requirements can be motivated and described as for the education of students about Michigan pre-
part of the lecture, and students can go back and history. This rubbed off on the students through the
review the lecture video if they have problems. papers and documents that were provided as well.
Close Daily Cooperation between Simplicity

Consumer and Developers
The project was designed to start simple and to
The online feature made it possible for the user gradually add complexity to the design. The layer-
to keep abreast of the project in the course and ing of complexity related primarily to adding the
observe the critiques by the instructor of the Artificial Intelligence components.
students’ project code. The user and instructor
would meet weekly to exchange information. In Self Organizing Teams
this case, frequent online interaction substituted
for face to face contact. Online students had to Teams were not directly supported in the class-
“check out” after each lecture by taking a short room framework, but the weekly review of student
debriefing quiz at the end of the lecture. This submissions encouraged students to look to others
worked to keep students up with the changing for ideas and insights. Given that the discussions
shape of the project. were taped, online students can get a better feel
for contributions made by the in class component.
Collocation of Developers to
Support Communication Regular Adaptation to
Changing Circumstances
The organization of the course was three tiered;
the students, the instructor, and the domain expert. Since the underwater survey was ongoing as the
The expert at the top level and the students were class was taking place, the weekly assignments
not collocated, the instructor was able to function were adjusted to reflect student performance
as the intermediary between the two groups. The in the previous assignments and input from the
student group met with the instructors 3 times a field by the team of archaeologists. As a result,
week, and the instructor met with the expert who basic cognitive capabilities were added first to
was not co-located one a week. While there was the agents, and then more detailed behaviors as
no physical collocation, the online framework appeared to be warranted by the survey results.
made the interaction go smoothly since the class In summary, this project required a flexible
meetings were recorded and available for others methodology but since its’ participants were not
to comment on. physically co-located the online component of
the course was necessary to realize the potential
of an agile approach here. In the next section we
briefly describe the game and its components as
they emerged over the term.
200
GAME DEVELOPMENT in a collective fashion. The result of a players’

placement of hunting blinds and campsites on the
Game Design land bridge will be a function of the intelligence
that each of the interacting species has. While
It was important that the students understand the these Paleo-Indians were hunter-gatherers our
genre form which the current was related. Stu- focus is on the hunting component. Gathering
dents were introduced to popular hunting games will be added later.
such as Atari’s “Deer Hunter”, and Klaus Jurgen
Wrede’s “Carcassonne” a board game based upon Game Object Organization
ancient hunting and foraging agents in Europe.
In our game, the idea was to position a campsite, The game supports the coupling or interaction
and hunting blinds over a portion of the land of both human and natural systems. The basic
bridge. Caribou, the North American reindeer, object hierarchy supported by the game is given
would move from northwest to southeast over in figure 5. The key is that the terrain object is
the bridge in the fall, and back across the bridge constantly being updated through the survey
in the spring. The landscape of the bridge was process. Changes in its composition then need to
assumed to be that of Arctic tundra populated by be reflected through changes to the information
spruce and grasses and containing possible small used in the construction of the state machines for
stands of water. the three different intelligent agent object classes,
The player positioned the campsite and hunting human, caribou, and wolf. Each state machine
stands on the land bridge. Then the hunters would encoded basic individual behaviors based on its
autonomously move out of the campsite daily environmental context. Thus, the key to the use
and position themselves in the nearest available of the agile technology here is to synchronize
blind. The hunting agents used state machines to changes between the terrain and game objects.
emulate decision making using knowledge learned In this section we focus on the game objects, and
using Cultural Algorithms. Cultural Algorithms in the following section we focus on the terrain
are an extension of Genetic Algorithms that used objects.
to generate social intelligence to guide the inter- Each group of agents therefore possesses both
action of multiple agents. When a caribou came individual and social intelligence. Each component
within a certain distance of the blind, the hunter will now be described. The individual cognitive
became aware of it, and tried to kill it. At the end capabilities supported by the system are as fol-
of the day, hunters would return to the camp with lows.
a portion of their kill. Additionally, other predators Sensing their immediate physical environ-
co-existed on the landscape with humans, in this ment: Every object class has a certain set of cogni-
case wolves. Wolves hunted in packs and generally tive capabilities in terms of being able to identify
kept away from humans unless they met head on objects within its’ local area. The capabilities vary
via a chance encounter, or the wolf becomes so from class to class, and can be adjusted over game
hungry that they will attack humans. levels if necessary.
Each of the object categories had both an in- Sensing their immediate social environment:
dividual and a social intelligence. For example, Each object can sense its relative position in its
caribou moved in herds, wolves moved in packs, social group, whether it is a herd, a pack, or a
and humans lived as a group and hunted in a dis- family. We added behaviors for the caribou such
tributed manner. The placement of hunting blinds alignment, flee, and separation in order to reflect
will determine whether they hunt individually or the ability of the caribous to move as a herd in a
201
Figure 5. An example object hierarchy for the game agent classes
line, to avoid obstacles, wolves, hunters, etc, and Learning capabilities: Cultural Algorithms
to separate into more than one herd if they were have been successfully used to acquire social intel-
attacked by the hunters or the wolves. ligence in a variety of application and will be used
Basic goals: Caribou, wolves, and humans all to support organizational learning here (Reynolds
have a goal of a survival which requires a certain & Ali, 2008; Reynolds, Ali, & Jayyousi, 2008).
amount of food, water, and other resources daily.
Humans for example will also need firewood for The Terrain Object
cooking and warmth, as well as stone for tool
making. At the onset of the project, there was some in-
Path Planning: A basic set of waypoints are formation about the general topography of the
located within the region. Individual agents from region as shown by the computer generated GIS
each category, human, caribou, or wolf can plan representation in Figure 6. However, prior to the
their path in order to avoid obstacles, moving in survey it was not clear what level of detail might
herds, and to achieve goals such as attaining food actually be found on the bottom. That is, will
and water. there be evidence of non-natural structures made
State Machines: Each category of agents by humans as well as other evidence suggesting
inherits a state machine from its class. The state the past distribution of resources. As the survey
machine keeps track of its current state in terms proceeded it became clear that there was sufficient
of its food and resource needs, its’ active goals, evidence for campsite and hunting blinds to al-
and its perceived environment. low them to be included in the model, as well as
202
behaviors that would utilize them. Therefore, as the game; and “Exit” to close the game applica-
more information was garnered from the survey, tion. The options screen gives the user a chance
the requirements for the game changed since there to configure the game and to change the difficulty
was now more information that the game could level of the game. The menu has four main op-
be used to predict. tions that can be useful in the modification of the
At the beginning of the game design, hunting number of wolves, caribou, and hunters.
blinds and campsites were not considered but
once it was clear that there was sufficient evidence
for them they were added into the game and the RESULTS
objects behaviors adjusted to include them. With
an agile approach, synchronizing the project re- Figure 8 gives a screen shot of the game environ-
quirements with the survey results would have ment containing spruce forest, water features,
been very difficult to achieve. And, without the rock formations along with hunting blinds and
online component it would have been difficult to campsites. One can see caribou herds, hunters,
effectively implement the agile technology in a and wolves distributed over the landscape as well.
situation that lacked aspects of collocation that In this figure wolves are attacking the caribou.
are critical to traditional agile approaches. Hunters are running to find ambush sites ahead
When the development of the game first began of the herd. Notice the emergent property of the
campsites and hunting blinds were not considered herd to split into smaller herds as result of an at-
to be used. The preliminary results of the survey tack. This behavior is observed in real caribou and
suggested that remains of such structures were emerged here as a result of the interaction of herd
present, so we added in those features along with members. The semi-circle of rocks corresponds
the AI required to exploit them. We then can to a man-made hunting blind.
observe how adding this new information in can Figure 9 presents another emergent behavior
affect the behavior of the various agents. Figure produced by the model. In the figure the herd of
7 gives a screen shot of main menu of the game caribou moves in a line, with one individual fol-
containing three options. The three options are as lowing another other. This behavior is observed
follows: “Play Game” to load and start playing in real caribou and merges here as a result of the
the game; “Options” to change the difficulty of individual movements of the herd members.
Figure 6. A GIS (geographical information system) Figure 7. A screen shot of the main menu of the
representation of the topography of the land bridge game. The user can start playing the game, change
the difficulty level, and exit the game.
203
Figure 8. A screen shot of the game as caribou move from the northwest to the south east along a por-
tion of the land bridge
Figure 10 is a screen shot of the caribou herd This emergent asymmetry provides us with
moving in a line as they head south while avoid- new information that we can use to answer the
ing the water and rocks. They are also able to following questions:
avoid hunters and wolves. This avoidance of
objects gives the hunters an opportunity to produce 1. Is there a positioning of blinds and sites that
drive lanes. These lanes force the herd into a exhibits equal kill probabilities for both north
linear pattern within a narrowing area which makes and south migrations?
hunting easier. 2. How does this compare to the optimal posi-
tioning blinds and sites for north migration
or south migration alone?
Figure 9. In this screen shot the caribou are moving in a line across the land bridge. Notice that caribou
avoid the hunters in their path.
204
Figure 10. The caribou herd moves along the water’s edge while avoiding the hunter
3. If the north and south optimal locations are us to infer the season in which the blinds were
different, what are the differences and why? most useful. As it turned out, blind placement
4. For an observed archaeological configuration was significantly more effective in north to south
of sites and blinds on the land bridge can we migration rather than from south to north. This
infer when they were used during the year? suggests that the primary hunting season was in
the fall and that positioning of the currently ob-
For example, we placed the hunting blinds served hunting blinds was to support fall hunting.
discovered by the archaeological survey in their These are new insights that the fusion of survey
proper positions on the land bridge. We then had and gaming approaches through an agile tech-
the herd simulate southern and northern migrations nology online has produced. The agile approach
respectively. The question of interest is whether online has taken advantage of the synergy of the
the positioning of the blinds produces a better two different technologies to produce insights
kill count for northern or southern migration. The that neither one could have produced on its own.
presence of a significant difference may allow
Figure 11. An autonomous underwater vehicle to be used for future surveys
205
FUTURE WORK that it will be an important factor in making good

decisions as to where to deploy these devices.
In this study we used an agile software design In addition, we will consider adding the abil-
methodology to produce a real-world game that ity of agents to learn to coordinate their own
can be used to help predict the location of Paleo- hunting activities using the Cultural Algorithm,
Indian sites on a submerged prehistoric land a socially motivated hybrid learning algorithm
bridge. It was critical to the success of the project developed by Reynolds (2008) We can then com-
that results of the ongoing underwater survey be pare the best learned results with those of the
synchronized with the current requirements of the human player. Likewise, we can allow caribou
developing game. Specifically, as new informa- herd and wolf packs to adjust their dynamics in
tion about environments contents was obtained, order to achieve their own goals of survival.
adjustments would be needed to the Artificial
Intelligence of the game games in order to exploit
them. However, some key requirements for an ACKNOWLEDGMENT
agile approach such as co-location of the expert
and the developers were missing. With the use of This project was supported in part by NSF High-
online technology we were able to work around Risk Grant #BCS-0829324.
these obstacles.
As a result we have produced a computer game
that fuses the most recently collected data with the REFERENCES
requisite AI to exploit it. In order to support the
incremental addition of computational intelligence Ambler, S. (2008). Scaling Scrum – Meeting Real
into our game an Agile Program Design method- World Development Needs. Dr. Dobbs Journal.
ology was employed. The results of playing the Retrieved April 23, 2008 from http://www.
game suggest that there is an inherent asymmetry drdobbsonline.net/architect/207100381.
to the migration pathways of the caribou. That is, Boehm, B., & Turner, R. (2004). Balancing Agil-
the positioning of blinds and the related campsites ity and discipline: A Guide for the Perplexed.
is a function of migration direction. This means Addison-Wesley Press.
that we are relating the positioning of non-natural
objects with the different seasons of the year. That Croley, T., & Lewis, C. (2006).. . Journal of Great
is an additional level of understanding that we Lakes Research, 32, 852–869. doi:10.3394/0380-
had not bargained for at the onset of the project. 1330(2006)32[852:WADCTM]2.0.CO;2
Future work will involve the use of manned
Ellis, C., Kenyon, I., & Spence, M. (1990). Oc-
and autonomous underwater vehicles such as that
casional Publication of the London Chapter .
shown in figure 11. Since sites are between 60
OAS, 5, 65–124.
and 200 meters below the surface, this means that
for the deepest sites the US navy recommends at Hough, J. (1958). Geology of the Great Lakes.
most 20 minutes exposure at those levels with 44 [Univ. of Illinois Press.]. Urbana (Caracas,
minutes of decompression in three stops on the Venezuela), IL.
way up. Therefore, it will be important to pinpoint
Larsen, C. (1999). Cranbrook Institute of Science.
areas that will be most likely to yield important
Bulletin, 64, 1–30.
information about prehistoric occupants in order
to make diving expeditions the more effective. Lewis, C. (1994).. . Quaternary Science Reviews,
The results produced by the game so far suggest 13, 891–922. doi:10.1016/0277-3791(94)90008-6
206
Lewis, C. (2007).. . Journal of Paleolimnology, Reynolds, R. G., Ali, M., & Jayyousi, T. (2008).
37, 435–452. doi:10.1007/s10933-006-9049-y Mining the Social Fabric of Archaic Urban Cen-
ters with Cultural Algorithms. IEEE Computer,
Lovis, W. (1989). Michigan Cultural Resource
41(1), 64–72.
Investigations Series 1, East Lansing.
Shott, M. (1999). Cranbrook Institute of Science
Monaghan, G., & Lovis, W. (2005). Modeling
. Bulletin, 64, 71–82.
Archaeological Site Burial in Southern Michigan.
East Lansing, MI: Michigan State Univ. Press. Thomas, R., Kemp, A., & Lewis, C. (1973).. .
Canadian Journal of Earth Sciences, 10, 226–271.
Moore, T., Rea, D., Mayer, L., Lewis, C., &
Dobson, D. (1994).. . Canadian Journal of Earth Warner, G., Hebda, R., & Hahn, B. (1984). Palaeo-
Sciences, 31, 1606–1617. doi:10.1139/e94-142 geography, Palaeoclimatology, Palaeoecology,
45, 301–345. doi:10.1016/0031-0182(84)90010-5
Reynolds, R. G., & Ali, M. (2008). Computing
with the Social Fabric: The Evolution of Social Wood, W., & Kleb, W. (2002). Extreme Program-
Intelligence within a Cultural Framework. IEEE ming in a research environment . In Wells, D., &
Computational Intelligence Magazine, 3(1), Williams, L. (Eds.), XP/Agile Universe 2002 (pp.
18–30. doi:10.1109/MCI.2007.913388 89–99). doi:10.1007/3-540-45672-4_9
207
208
Chapter 12
Management of Distributed
Energy Resources Using
Intelligent Multi-Agent System
Thillainathan Logenthiran
National University of Singapore, Singapore
Dipti Srinivasan
National University of Singapore, Singapore
ABSTRACT
The technology of intelligent Multi-Agent System (MAS) has radically altered the way in which com-
plex, distributed, open systems are conceptualized. This chapter presents the application of multi-agent
technology to design and deployment of a distributed, cross platform, secure multi-agent framework to
model a restructured energy market, where multi players dynamically interact with each other to achieve
mutually satisfying outcomes. Apart from the security implementations, some of the best practices in
Artificial Intelligence (AI) techniques were employed in the agent oriented programming to deliver
customized, powerful, intelligent, distributed application software which simulates the new restructured
energy market. The AI algorithm implemented as a rule-based system yielded accurate market outcomes.
INTRODUCTION the operational efficiency through distributed

control and monitoring of resources.
The electricity grid is the backbone of the power Intelligent grid should be self-healing and
network and is at the focal point of technological reconfigurable to guard against man-made and
innovations. Utilities need to introduce distributed natural disasters. One way to assure such charac-
intelligence into their existing infrastructure to teristics in an electric power grid is to design small
make them more reliable, efficient, and capable and autonomous subsets of the larger grid. These
of exploiting and integrating alternative sources subsets are called intelligent microgrids which are
of energy. The intelligent grid includes the in- used as a test bed for conglomerate innovations
frastructure and technologies required to allow in communication technologies, smart meter-
distributed generation of energy with increasing ing, co-generation, and distributed intelligence
and control. The test bed serves to showcase the
DOI: 10.4018/978-1-60566-898-7.ch012 capabilities of the developed systems, thereby ac-
Management of Distributed Energy Resources Using Intelligent Multi-Agent System
celerating the commercialization of technologies electricity crisis (Budhraja, 2001), many critics
and solutions for smart grids all over the world. have agreed that deregulation is indeed a noble
Multi-agent system is one of the most excit- endeavour. The problem associated with deregu-
ing and fastest growing domain in agent oriented lation can be solved with structural adjustments
technology which deals with modeling of autono- to the markets and learning from past mistakes.
mous decision making entities. Multi-agent based This chapter shows the development and
modeling of a microgrid is the best choice to form implementation of multi-agent application to
an intelligent microgrid (Rahman, Pipattanasom- deregulated energy market. The developed appli-
porn, & Teklu, 2007; Hatziargyriou, Dimeas, cation software is a testament of the multi-agent
Tsikalakis, Lopes, Kariniotakis, & Oyarzabal, framework implementation and effectiveness of
2005; Dimeas & Hatziargyriou, 2007), where each dynamic modeling of multi-agent environment
necessary element in a microgrid is represented where the internal tasks of each agent are executed
by an intelligent agent that uses a combination of concurrently with external inputs from the agent
AI-based and mathematical models to decide on world. Successful deployment of the application
optimal actions. software coupled with high degree of robustness
Recent developments (Rahman, Pipattana- indicates the relevance and operational level of
somporn, & Teklu, 2007; Hatziargyriou, Dimeas, multi-agent system based application software
Tsikalakis, Lopes, Kariniotakis, & Oyarzabal, development. User can use the software for any
2005; Sueyoshi & Tadiparthi, 2007) in multi-agent size of power system by defining the number of
system have shown very encouraging results in agents in the system and inserting the associated
handling multi-player interactive systems. In information.
particular, multi-agent system approach has been The structure of the remaining chapter is as
adopted to simulate, validate and test the open follows: Section 2 provides the introduction of mi-
deregulated energy market in some recent works crogrid and Distributed Energy Resource (DER),
(Sueyoshi & Tadiparthi, 2007; Bagnall & Smith, and Section 3 gives an introduction of restructured
2005; Praça, Ramos, Vale, & Cordeiro, 2003; electricity market. Section 4 describes the imple-
Logenthiran, Srinivasan, & Wong, 2008). Each mentation of multi-agent system based application
participant in the market is modeled as an autono- software for PoolCo energy market simulation.
mous agent with independent bidding strategies Section 5 demonstrates the flow of simulation of
and responses to bidding outcomes. They are able the implemented application software. Section 6
to operate autonomously and interact pro-actively discusses results of PoolCo outcome of a sample
within their environment. Such characteristics of microgrid. Finally, it is concluded in the seventh
agents are best employed in situations where the section.
role identities are to be simulated as in a deregu-
lated energy market simulation.
The dawn of the 21st century has seen nu- BACKGROUND
merous countries de-regulating or lobbying for
deregulation of their vertically integrated power Microgrid and Distributed
industry. Electric power industry has seen an evo- Energy Resource
lution from a regulated to a competitive industry.
The whole industry of generation, transmission and Over the years, the computer industry has been
distribution has been unbundled into individual evolving continuously and the power industry
competing entities. Although the journey has been has remained relatively stable. In the past few
far from seamless as observed in the California’ years, the power industry also has seen many
209
revolutionary changes. The deregulated energy utilities is experiencing major changes in the
environment has favoured a gradual transition structure of its markets and regulations (Lasseter,
from centralized power generation to Distributed Akhil, Marnay, Stephens, Dagle, Guttromson,
Generation (DG) where sources are connected at Meliopoulos, Yinger, & Eto, 2002; Shahidehpour
the distribution network. Several technologies, & Alomoush, 2001). The power industry has
such as diesel engines, micro turbines, fuel cells, become competitive because the traditional
wind turbines and photovoltaic systems can be part centralized operation is replaced with an open
of a distributed generation system. The capacity of market environment. This transformation is often
the DG sources varies from few kWs to few MWs. called as the deregulation of electricity market.
Distributed systems can also bring electricity to Market structure varies from country to country
remote communities which are not connected depending on the policies adopted in the country.
with the main grid. Such multiple communities For example, the Independent System Operator
can create a microgrid of power generation and (ISO) and the Power Exchange (PX) are separate
distribution. entities in some countries’ markets like Califor-
Microgrids can be defined as low voltage nia’s market, although the PX functions within
intelligent distribution networks comprising the same organization as the ISO, while they are
various distributed generators, storage devices under the same structure with control of the ISO
and controllable loads which can be operated as in some other markets.
interconnected system with the main distribution To implement a competition, vertically in-
grid, or as islanded system if they are disconnected tegrated utilities are required to unbundle their
from the main distribution grid. The common retail services into generation, transmission and
communication structure and distributed control distribution. Generation utilities will no longer
of DG sources together with controllable loads have a monopoly. Even small business companies
and storage devices such as flywheels, energy will be free to sign contracts for buying power
capacitors and batteries, are central to the concept from cheaper sources. Many new concepts (Sha-
of microgrids (Lasseter, Akhil, Marnay, Stephens, hidehpour & Alomoush, 2001; Shahidehpour,
Dagle, Guttromson, Meliopoulos, Yinger, & Eto, Yamin, & LI, 2002) have appeared to facilitate
2002). From the grid’s point of view, a microgrid the way of dealing with restructuring. A few criti-
can be regarded as a controlled entity within the cal roles of these entities and concepts which are
power system that can be operated as a single instrumental for understanding the multi-agent
aggregated load and a small source of power or system based modeling of restructured energy
ancillary services supporting the network. From markets are discussed here.
the customers’point of view, microgrids are similar
to traditional low voltage distribution networks Independent System Operator (ISO)
which provide local thermal and electricity needs.
In addition, microgrids enhance the local reliabil- ISO is an independent entity of individuals in
ity, reduce emissions, improve the power quality market energy market such as generation, trans-
by supporting voltage, and potentially lower the mission, distribution companies and end users. The
cost of energy supply. ISO administers transmission tariffs, maintains
the system security, coordinates maintenance
Deregulated Energy Market scheduling, and has a role in coordinating long-
term planning. The main purpose of an ISO is to
Around the world, the electricity industry which ensure fair and non discriminatory access of the
has long been dominated by vertically integrated grid, transmission lines and ancillary services. ISO
210
manages the power flow over the transmission Generation Companies (GENCOs)
system and facilitates reliability requirements of
the power system. The ultimate role of ISO is to Generation companies are formed once the
ensure that the total generation meets the demand generation of electric power is segregated from
by taking congestion and ancillary services into the existing utilities. They take care of the operation
account. This function is carried out by controlling and maintenance of existing generating plants.
dispatch of flexible plants and giving orders to Electricity from them is either sold to short term
adjust the power supply levels or curtail loads to markets or provided directly to the entities that
ensure loads matching with the available power have contracts with them for purchase of electric-
generation in the system. ity. Besides real power, they may sell reactive
power and operating reserves. GENCOs include
Power Exchange (PX) Independent Power Producers (IPP).
PX monitors and regulates the economic operation Customers

of the interconnected grid. It operates as an inde-
pendent, non-government, non-profit entity which They are the end users of electricity with different
provides schedules for loads and generators. In load requirements. They may be broadly catego-
the PoolCo model, PX establishes an auction like rized into industrial, commercial and residential.
spot market where energy bids of the generators In a restructured market, customers are no longer
and consumers are matched anonymously based obligated to purchase any services from their
on the bidding quantities and prices. PX finds the local utility company. Customers have rights to
market clearing quantities and the market clearing make direct access to generators or other power
price from market equilibrium point. PX operates a providers and choose best packages of services
day-ahead market as well as an hour-ahead market that meet their needs.
separately depending on the marketing period.
Physical Power System Constraints
Transmission Companies (TRANSCOs) and Transmission Pricing
The transmission system is the most essential The agreements between trading parties are made
element in electricity markets. The secure and based on the outcome of the energy market, which
efficient operation of the transmission system is does not represent the actual power flow in the
necessary for efficient operation of these markets. system. Constraints in the power system, for ex-
A TRANSCO has the role of building, owning, ample transmission losses and contract transmis-
maintaining, and operating the transmission sys- sion paths, affect the operation of the transmission
tem in a certain geographical region to provide system. Due to transmission line losses, the power
services for maintaining the overall reliability injected at any node in the system to satisfy a cer-
of the electrical system. The use of TRANSCOs tain demand at another node depends on the loss
assets comes under the control of the ISO and factors between the nodes. Transmission losses
they are regulated to provide non-discriminatory will affect the actual power injection pattern and
connections and comparable services for cost quantity to the network.
recovery. The ISO oversees the operation and Another issue that affects transmission pricing
scheduling of TRANSCOs’ facilities. is Contract Path which has been used between
transacted parties as dedicated paths where power
flows are assumed to flow through pre-defined
211
paths. However, physically electrons could flow ISO and PX are modeled as separate entities like
in a network over parallel paths owned by sev- in the California’s energy market (Shahidehpour
eral utilities that may not be through the contract & Alomoush, 2001; Shahidehpour, Yamin, & LI,
path. As a result, transmission owners need to be 2002) to illustrate their individual roles in the
compensated for the actual use of their facilities. energy market and the typical day-ahead PoolCo
The above are just two of the many implica- model is chosen because of its simplicity.
tions that power system constraints affect pricing The PoolCo model (Shrestha, Song, & Goel,
in a restructured market. Though it is beyond the 2000) consists of competitive independent power
scope of this discussion and also beyond the scope producers, vertically integrated distribution com-
of this application development, managing such panies load aggregators and retail marketers. The
constraints and their impacts in pricing is essential. PoolCo does not own any part of the generation
or transmission utilities. The main task of PoolCo
Market Models is to centrally dispatch and schedule generating
units in the service area within its jurisdiction.
The main objectives of an electricity market are The operating mechanism of the PoolCo model is
to ensure the secure efficient operation and to described in Figure 1. In a PoolCo market opera-
decrease the cost of electricity through competition, buyers (loads) submit their bids to the pool
tion. Several market structure (Praça, Ramos, Vale, in order to buy power from the pool and sellers
& Cordeiro, 2003; Shahidehpour & Alomoush, (generators) submit their bids to the pool in order
2001; Shrestha, Song, & Goel, 2000) models exist to sell power to the pool. All the generators have
all over the world. These market models would right to sell power to the pool but they can not
differ in terms of marketplace rules and gover- specify customers.
nance structure. Generally they can be classified During PoolCo operation, each player will
into three types such as PoolCo model, Bilateral submit their bids to the pool which is provided
contract model and Hybrid model. by PX. The PX sums up these bids and matches
The PoolCo market model is a marketplace interested demand and supply of both sellers and
where power generating companies submit their buyers. The PX then performs economic dispatch
production bids, and consumer companies submit to produce a single spot price for electricity for
their consumption bids. The market operator uses the whole system. This price is called the Market
a market clearing tool to find the market clearing Clearing Price (MCP) which is the highest price
price and accepted production and consumption in the selected bids of the particular PoolCo
bids for every hour. The bilateral contracts are simulation hour. Winning generators are paid the
negotiable agreements between sellers and buy- MCP for their successful bids while successful
ers about power supply and reception. Bilateral loads are obliged to purchase electricity at MCP.
contract model is very flexible because negotiat- Generators compete for selling power. If the bids
ing parties can specify their own contract terms submitted by generator agents are too high, they
and conditions. Finally, the third market model is have low possibility to sell power. Similarly, loads
hybrid model which is a combination of PoolCo compete for buying power. If bids submitted by
and Bilateral contracts models. It has the features load agents are too low, they have low possibil-
of PoolCo as well as Bilateral contracts models. ity to get power. In such a model, generators bids
In this model, customers can either negotiate with with low cost and load bids with high cost would
a supplier directly for a power supply agreement essentially be rewarded.
or accept power from the pool at the pool market
clearing price. For this software development,
212
Figure 1. PoolCo market model
Market Operation Figure 2. Flow of PX-ISO in market operation
The ISO and The PX handle the market op-

eration which can be a day-ahead market or an
hour-ahead market (Shahidehpour & Alomoush,
2001; Shahidehpour, Yamin, & LI, 2002). In the
day-ahead market, for each hour of the 24-hour
window, sellers bid a schedule of supply at vari-
ous prices, buyers bid a schedule of demand at
various prices, and market clearing price and
market clearing quantities are determined for
each hour. Then the PX schedules supply and
demand with help of the ISO. The ISO finalizes
the schedules without congestion. An hour-ahead
market is similar to a day-ahead market, except
the total scheduling period is an hour instead of
a day. Typical market operation in a restructured
power system is shown in Figure 2.
213
IMPLEMENTATION OF MULTI AGENT management service is responsible for managing

SYSTEM the agent platform which maintains a directory
of Agent Identifiers (AIDs) and provides white
Multi-Agent Platform page and life cycle services. Directory facilitator
provides the default yellow page services in the
A multi-agent platform provides a platform for platform which allows the agents to discover the
implementing agent world and managing agents’ agents in the network based on the services they
execution and message passing. JADE (Java wish to offer. Finally, message transport system
Agent DEvelopment) is a multi-agent platform provides a channel for agent communication and
which is chosen for this development software. is responsible for delivering messages between
JADE (http://jade.tilab.com/) aims for developing the agents.
multi-agent systems and applications conforming
to Foundation for Intelligent Physical Agents Structure of Software Packages
(FIPA) standards (http://www.fipa.org/) for intel-
ligent agents. JADE is a middleware which means A generic service component software packages
that JADE provides another layer of separation have been designed and developed for this soft-
between the software and the operating system. ware implementation which is shown in Figure 4.
In this implementation, the underlying operating Agents, Behaviours, Ontology, Tools and Power
system is the Java virtual machine. JADE is fully System are the main packages developed in this
coded in Java which provides an attractive features agent framework.
of this implementation because Java has several The Agents package consists of a collection
attractive features over the other programming of different types of agents. The Behaviour pack-
languages such as extensibility, security and age details a collection of different types of tasks
cross-platform. assigned to various agent entities. The Ontology
JADE platform provides Agent Management package specifies a set of vocabulary for agent
Service (AMS), Directory Facilitator (DF) and language which agents understand and use for
Message Transport System (MTS) which are the their communication with each other in the frame-
necessary elements in a multi-agent system as work. The Tools package implements a collection
specified by FIPA standards. Typical FIPA compli- of tools which are used by the agents in the frame-
ant agent platform is illustrated in Figure 3. Agent work for managing and processing the auction.
Figure 4. Structure of software packages in the

Figure 3. FIPA compliant agent platform
framework
214
Figure 5. Different types of agents in the agent world
Finally, the Power System package comprises of received, PX disseminates this information back
data structures used to represent the state of the to the relevant market participants.
physical power system network. ISO Agent: ISO in this framework performs
the roles of a regulatory body. ISO seeks to ensure
Agents Package the authenticity of the bids and the stability of the
network. Bids are checked for violations and acted
The Agent package in this framework consists upon accordingly. ISO would conduct these checks
of several agents such as ISO, PX, Schedule with the help of a rule based engine customized
Coordinator, PoolCo Manager, Power System for ISO. In addition, network simulations are
Manager, Security Manager, Sellers and Buyers. carried out on the day schedules to ensure stabil-
This application software focuses only on the ity of the network with power world simulator.
restructured energy market simulation. The dif- ISO also maintains a database of day schedules.
ferent entities would interact in this framework As mentioned earlier, ISO has the broad role of
to simulate a day-ahead market. A test system seeing to the system operation and the stability
with three sellers and five buyers is considered of the network in this project.
for the case study, however it can be extended for Security Manager Agent: The security manager
any number of market participants. The Figure 5 is an overall central information security hub that
shows some of the main agents implemented in provides all encryption, decryption, encryption
this framework. keys generation, issue of digital certificates and
PX Agent: PX agent has been customized for other security related services to the agent world.
purpose of modeling the restructured energy All agents have to register with security manager
market. PX acts as a middle man between the to make use of the security services in the network.
various market participants and ISO. Market As all message transmission is done through the
participants will submit their bids to the pool. PX Secured Socket Layer (SSL) protocols, agents
performs scheduling through Schedule Coordina- which do not register with the security manager
tor (SC) agent. The schedule coordinator will will have no access to the SSL service thus they
collate all the bids and determine a schedule using will not be able to communicate with any other
the market clearing engine. agents in the network.
For any particular day schedule, PX will also Authorized agents will have valid ID in the
scans for any violation bids which are sent back agent world and these IDs are used to register
by the ISO. If any vectorized violated bids are with security manager. In this application soft-
215
Figure 6. Security architecture of the agent world
ware, security architecture for intelligent agents • The encryption process is done by two sep-
is employed in the network as shown in Figure 6. arate entities (Message encoding agent and
Security manager has exclusive access to mes- channel encoding agent). Unlike systems
sage encoding agent and channel encoding agent with only one encrypting entity where
which are mainly responsible for providing en- all encryption are done centrally, it takes
cryption of messages and channels services to all twice the effort to search or guess a key
agents. All agents who wish to send messages same as the generation key is done by two
across the network will have to engage their ser- separate entities.
vice in encrypting the message and channel which • The channel encryption provides dual level
they are going to send through. The agents need of security. Every time a message is sent
to contact the security manager agent, upon the between two agents, a new secured chan-
authentication; message encoding agent and channel is established. The encryption key used
nel encoding agent provide security services to to establish the secured channel is always
the agents. Message encoding agent will provide different. Since the channel encryption is
encryption service for the sending agent after always different, the key value for decryp-
receiving encryption order from security man- tion is also always different. This makes it
ager and channel encoding agent will encrypt the even harder for unauthorized interception
message channel between the sending and receiv- of messages to succeed.
ing agent after receiving encryption order from
security manager. When the message is success- Behaviour Package
ful encrypted, sending agent will send the en-
crypted message through the encrypted channel. In multi agent systems, agents with different na-
Such architecture provides double redundancy ture are endowed with different attributes, beliefs
in communications security for extra security. For and objectives. These behaviours are added to
any hacker to successfully decrypt the messages internal tasks of agents. Behaviour is basically
sent between two agents, it is needed to break the an event handler. Behaviours in the agents are
encryption of the channel and then the message executed in an order as arranged by the JADE
itself. This is difficult to achieve for the follow- internal behaviour scheduler. This scheduler is
ing reasons. responsible for organizing and arranging all the
216
Figure 7. Schematic of behaviours package
behaviours of each and every agent. The Figure 7 8 shows some of the main ontologies implemented
shows some of the main behaviours implemented in this framework.
in this framework. Back to the software programming of the
implementation; when an agent sends information
Ontology Package to another agent, it will be serialized in a fashion
which is normally not understood by another agent
The concept of ontology can be imagined as unless it knows what ontology is used. Once the
vocabulary in an agent world for the communi- agent knows which class of ontology does this
cation. By defining any information as a specific information belongs to, it will be able to “decode”
ontology, it will look like our speaking language. and “reassemble” the information in a meaningful
Once the agent determines what language this way such that it can read useful information from
information is coded in, it will try to retrieve an it. Some of the ontologies implemented in this
ontology “decoder” from the ontology package. framework are given in details below.
With the help of these “decoders”, the receiving Bid: This is a class for all bids submitted by
agent though not speaking the native language will sellers and buyers to the PoolCo. These bids
be able to understand the other agent. The Figure contain the bid price, quantity and much other
Figure 8. Schematic of ontology package
217
Figure 9. Structure of contract ontology

buyers and sellers like location of electricity pur-
chased, location of electricity injected, payment
method and the other related issues pertaining to
the conditions of sale.
Contract: This is the integrated ontology of
bid and, terms and conditions ontologies. The bid
defines the price and quantity of the electricity
sale. The terms and condition object defines all
contractual issues pertaining to the electricity sale
agreement. As the name suggest, these information
will be embedded in the contract object defining
all scopes of the electricity transaction details.
Figure 9 shows the structure of this ontology.
Agent Record: This ontology implements a
data structure record of every agent with regards
peripheral information like the owner of bid, date to its participation in the market network. Every
of submitted, date of expiry, type of bid (seller agent holds a copy of its own record. The PoolCo
bid or buyer bid) and originating bid address. also maintains a collection of all the agents on
Owner of the bid refers to the name of the agent the network in the form of agent record object.
in the agent world and it also includes information The PoolCo keeps these agent record objects in
about the physical address of the agent where it a secure hashtable for its own reference and man-
is residing on. Information on the date which a agement of players. Each agent record contains
bid is submitted is useful when tracking the bids’ the information about the agent such as subscrip-
order for processing. Information on the expiry tion status with PoolCo, proposals status with the
date of a bid is very important because it is used PoolCo, owner of this agent, current bid submit-
by ISO for finding the violation bids. ted to the PoolCo, successful bid by the PoolCo,
Terms and Conditions: This ontology is an record of its own message history with all other
abstract object that defines the specific terms and agent entities and state of the owner agent in cur-
conditions. This object will be embedded together rent negotiations. Figure 10 shows the structure
with the bid object to form the contract object. of this ontology.
This object specifies contractual issues pertaining Market Clearing State: This ontology is a data
to the agreement on the electricity sale between structure detailing outcome of an auction. It in-
Figure 10. Structure of agent record ontology
218
Figure 11. Structure of market clearing state

Seller 1 sent a bid with 95kW even though it
agreed to sell at least minimum of 100kW. There-
fore Seller 2 violated its maximum limit and
Seller 1 violated its minimum limit.
Tools Package
This package provides a set of generic and specific

tools used in this framework. They are provided as
public type services to all agents. Some of these
tools like ISO rule engine provide provisions
for expansions. The necessary extensions can be
applied in future. The Figure 12 shows the tools
implemented in this framework and some of them
cludes information on market clearing price, total are given in details below.
successful volume of electricity transacted, suc- ISO Rule Base System: These rules belong to
cessful buyers, successful sellers and details about the operation of ISO and implemented to check
the market is experiencing excess demand or the hourly bids of the day schedule based on the
supply at the going market clearing price. Sche- set of user defined rules. In this framework, only
matic of this object is as shown in the Figure 11. three main rules are developed. First rule is to
Violation Bid: This is a class that deals with ensure that the date and time of bid submission
violated hour bids. An hour bid that violates the for every hour is not violated. Market participants
rules and conditions laid out by ISO which is in this simulated environment are required to
carried out by the ISO rule engine will be created submit their 24-hour bids for a particular day on
as a violation bid. The violation bid object will or before at 12:00 PM of the day before the
contain details of the faulty, amended bids and simulation day. Any time submission violation
the type of violation. In a particular day schedule, would result in a monetary penalty for the concern
all violation bids will be collated into a vector participant. After sieving out the time violation,
form and the vectorized violated bids will be sent the violated bid would then be vectorized into a
to PX as an encoded message. Table 1 shows vector of violation bids. The second and third
examples for these three rules of violation of bid- rules are being carried out on the quantity of sell-
ding implemented in this software. ers’ bids. Sellers in the network are allowed to
Time violation is shown for Buyer 1 because produce pre-determined maximum and minimum
it sent a delayed bid which can be seen in table. quantities. Seller quantity bids are checked to
Seller 2 sent a bid with 420kW even though it ensure that the maximum and minimum allowable
agreed to sell up to a maximum of 400kW and limits are not violated.
Table 1. Different types of violated bids
Market Participants Bid Received Time Bid Received for Quantity Price
Buyer 1 01-04-2009 12:00 PM 01-04-2009 11:00 PM 379.0 27.7
Seller 2 31-04-2009 11:00 AM 01-04-2009 03:00 PM 420.0 28.7
Seller 1 01-04-2009 12:00 PM 02-04-2009 11:00 PM 95.0 32.0
219
Figure 12. Tools in the framework
Content Slot Coding: This is a main encoding counter proposals. Effectively, the initiating agent
and encryption engine used for scrambling mes- has to pick from the presented contracts and cannot
sages sent from one agent to other. The message negotiate the price. The advantage of contract-net
is first serialized and then encoded using the is that it distributes computing, allowing the spe-
Base64.jar which is the encoding file used for cific agent which started a contract net process to
JADE serialization and transmitting sequences be responsible for evaluating bids and deciding
of bytes within an ACL Message. Further, it is based on its own rules which contracts to accept.
encrypted using the OpenSSL package to apply It also separates internal agent information from
strong encryption on the message which will be one another, since agents only communicate
used in the RSA algorithm. through the defined contract-net protocol and all
calculations are done internally to each agent.
Coordination between Agents Since the agents can change at every contract-net
cycle, there is no dependence on a specific agent.
The coordination between agents is an important A system with more complex negotiation might
issue in the MAS. In an energy market model, the lead to lower costs for the system. However,
agents coordinate (Koritarov, 2004; Krishna, & simple contract- net is sufficient to demonstrate
Ramesh, 1998; Krishna & Ramesh, 1998a) among a distributed coordination framework.
themselves in order to satisfy the energy demand A directory service allows agents to register
of the system accomplish with the distributed themselves and publish their capabilities. By us-
control of the system. The coordination strategy ing a directory service, agents do not have to be
defines the common communication framework aware of the other agents. For example, a load
for all interactions between agents. Simple con- agent will look up sources in the directory every
tract-net coordination is chosen for the process- time it wishes to secure a new supply contract.
ing of wholesale market because of its simplest This allows for agents to be added or removed
coordination strategies. All discussions between from the system at any time since agents are
agents are started simply by a requesting agent included in contract-net negotiations once they
asking the other agents for a proposed contract register themselves with the directory service. The
to supply some commodity, and then awarding coordination layer that the approach defines is the
contracts from the returned proposals in a fashion strategic layer above the real time layer. Because
that minimizes cost or fulfils some other goal. The of the time required for a contract-net interaction
disadvantage of simple contract-net coordination to complete, and since contracts are assigned in
is only simple negotiation without allowing for discrete time intervals, this coordination layer
220
Figure 13. Communication between the agents
cannot address real time issues. The coordination non-contractual binding and SSL communications,
layer allows for the distributed agents to plan how contractual binding communications and finaliza-
resources should be applied for satisfying demand. tion and sealing of contracts. The general flow of
The actual operation of the system components the software simulation can be seen in Figure 14.
self regulates through negative feedback since The Figure 15 shows multi-agent system
the system cannot produce more energy than is launching. It is started via agent launch pad by
consumed. Figure 13 shows the overall commu- which all the administrative agents such as
nication between the agents in this simulation. PoolCo manager agent and its subordinate agents,
security manager agent and its subordinate agents,
and power system manager agent are launched.
SIMULATION OF DEVELOPED Buyer agents and seller agents are created and
SOFTWARE launched as static agents in a local machine.
After seller agents and buyer agents are cre-
The Multi-agent framework and generic service ated, they will execute their own thread to initial-
components implemented in the software are ize their generation capacities, load requirements
integrated to deploy a simulation application of and bidding price by obtaining these values from
modeling of restructured energy market. This a centrally held database in the simulation envi-
simulation framework consist of four main states ronment. When all the parameters of the agents
namely, agent world creation and initialization, are properly initialized, each agent will autono-
Figure 14. General flow of software simulation
221
Figure 15. Initialization of the agent world
mously register itself with the DF as their first execution. After retrieving the necessary directory
task. The process is illustrated in Figure 16. listing of various agents in the network, each agent
As soon as the agents registers themselves with will contact the security manager for allocation
the DF, the agents will query the DF for a complete of ID keys to be used for encryption purpose and
listing of agents and their services in the network SSL algorithm engine as shown in Figure 17.
using a certain search constraints. These search As soon as the agents registered for security
constraints are usually queries to the DF for agents services, all further communication on the network
with a certain types of services or agents with will be encrypted. When PoolCo is ready to com-
certain types of names. Sellers will send a query municate with all player agents, it will broadcast
to the DF on all other buyer agents and PoolCo an OPEN message as shown in Figure 13. All the
manager agent. Buyers will also end a query to player agents who wish to take part in this round
the DF on all other seller agents and PoolCo of bidding, will respond by sending a SUBSCRIBE
manager agent. The DF will respond with listing message to subscribe PoolCo manager agent.
of all agents that match their search constraints, PoolCo will close the subscription window after
and all the physical addresses of these agents. everyone in the network has subscribed or when
With this information, agents will be able to the subscription date is expiry, whichever is ear-
autonomously contact them at their own thread of lier.
Figure 16. Registration and query of the agents
222
Figure 17. Role of the security manager in the agent world
Once everyone in the network has subscribed and SC agent are the entities used to model the
PoolCo manager agent, the PoolCo manager agent internal division of PoolCo management. PoolCo
issues an AGREE message to agree all agents manager agent is the front-door communication
who have signed up for the subscription service. entity for representing the bidding system.
This reply is to confirm their subscription. When CFP message will arrive at the player agents
the AGREE message arrives, player agents will who have previously subscribed to the PoolCo
stop their own internal execution of whatever task service. Upon receiving this message, agents will
they are involving with (e.g. receiving updates stop their execution as before and handle this
from the DF for listing of agents in the network, newly arrived message. The player agents prepare
manipulating input data from DF) to handle this themselves for bidding if they are interested to
newly arrived message and they will record this participate in this round of bidding. In this stage,
correspondence in their internal message history player agents will submit the formal bids to the
database. All message exchanges are recorded by PoolCo manager agent. PoolCo manager agent will
each and every agent in their own internal message process these bids and send the results to them.
history database. After that, they will resume their These submissions of bids and replies from PoolCo
operation at whatever they were doing before. manager agent are legally binding contracts. Buy-
At the same time, they will continue to listen for ers who submitted bids to buy are legally obligated
new messages. to buy the quantity of power at bided price. The
After PoolCo manager agent sends out AGREE same things are applied for sellers too. Agents,
message to every agent who sent subscription who are interested for submitting bids, will have
message, PoolCo manager agent will also update access to their internal bidding records. They will
its own internal message history database and pro- prepare the necessary parameters like price and
ceed to prepare for a call for proposal broadcast. quantity of electricity to buy or offer in the market
Once it prepared Call For Proposal (CFP), it will and the prepared parameters will be encoded as
retrieve the list of subscribed agents in the network a bid object. When encoding is completed, they
and send a CFP message to all the subscribers. will send a PROPOSAL message to PoolCo man-
After PoolCo manager agent sent CFP message, it ager agent with the bid object enclosed. PoolCo
will also send a message to ISO agent, PX agent manager agent receiving up on the PROPOSAL
and SC agent to initialize them and prepare for message will re-directed these messages to PX
eminent auction and scheduling task. PX agent agent for recording. PoolCo manager agent will
223
only close the proposal window after everyone PROPOSAL message to unsuccessful bidders. All
in the network has submitted their proposals or bidders will be notified of their bidding outcomes
proposal window expiry date is due, whichever is at the end of every bidding round.
earlier. The proposal expiry date is by default one Agents, who receive an ACCEPT PROPOSAL
day after PoolCo manager agent sent out its CFP message, will record their successful bid object
message. After the proposal window is closed, PX and update their internal records. Then they send
agent will process the bids collected by a series of an OK message to the PoolCo manager agent to
sorting and tagging. These whole set of data will acknowledge the contract. Agents, who receive a
be hashed into a hashtable. Then this hashtable REJECT PROPOSAL message, will record their
will be sent to SC agent. At the same time, SC unsuccessful attempt and make changes to their
agent will send a message to the PoolCo manager internal records.
agent to notify scheduling in progress. This whole process is a round of bidding in the
SC agent has an algorithm which computes PoolCo model for one slot. In case of day-ahead
a data structure that represents the aggregated market, it is for one hour of the 24 hour slots. Agents
demand and the aggregated supply with respect usually submit a complete scheduling of 24 bids,
to the price component using a rule based system. representing their bids for the day-ahead market.
These sets of data will be processed to produce a
single spot price at market equilibrium where the
demand meets the supply. This price is called as the RELIABILITY ChECKING
Market Clearing Price (MCP). It will also calculate OF MICROGRID
the quantity of electricity transacted at this price.
PX agent will also determine the successful buyer Once the market is simulated, before the sched-
agents and seller agents in this round of bidding uling is proposed, the stability and reliability of
based on the MCP and quantity of electricity the power network is checked using power world
transacted. Then the whole set of data will be sent simulator in order to ensure that the scheduling
to ISO agent to check for violation of bidding as does not undermine the stability and reliability of
well as to check for congestion of scheduling. the system. Power world simulator is a commer-
If any bidding is violated or the scheduling is cial power system simulating package based on
congested, ISO will do the necessary actions. On comprehensive, robust power flow solution engine
the other hand, the whole set of data comprising which is capable of efficiently solving a system up
of MCP, quantity of electricity transacted, list of to 100000-bus problems. It also allows the user to
successful buyer agents and seller agents, list of visualize the system through the use of animated
unsuccessful buyer agents and seller agents will diagrams providing good graphical information
be sent to PoolCo manager agent. about the technical and economic aspects of the
After receiving this data, PoolCo manager network. A snapshot of power world simulator is
agent extracts out the relevant information and shown in Figure 18. It has several optional add-
sends to power system manager agent so that ons. OPF and SimAuto add-ons are integrated in
power system manager can update the power this software development.
system state. PoolCo will also extract the list of The optimal power flow (OPF) provides the
successful bidders from the set of data and sends ability to optimally dispatch the generation in an
a ACCEPT PROPOSAL message to successful area or group of areas while simultaneously en-
bidders embedded with details of the successful forcing the transmission line and interface limits.
bids. PoolCo will also extract the list of unsuc- The advantages of this OPF over other commer-
cessful bidders from the data and sends a REJECT cially available optimal power flow packages are
224
Figure 18. A snapshot of the distributed power system
its ability to display the OPF results on system a text file, or a power world AUX file for added
one-line diagrams and contour the results for ease functionality. SimAuto is an automated server
of interpretation, and the ease with which the that enables user to access the functionalities from
users can export the OPF results to a spreadsheet, a program written externally by Microsoft Com-
Figure 19. Demonstration of successful implementation of MAS
225
ponent Object Model (COM) technology. Even sented in this chapter. The developed multi-agent
though Java does not have COM compatibility, application software simulates the restructured
Java integrates the Java Native Interface (JNI) energy markets with accurate results. Further, this
which is a standard programming interface be- is a testament of the multi-agent framework imple-
tween Java programming and COM objects. JNI mentation and effectiveness of dynamic modeling
allows Java virtual machine to share a process of multi-agent environment where internal tasks
space with platform native code. of each agent are executed concurrently with
If the schedule generated for the microgrid external inputs from the agent world. Successful
results in congestion, ISO would employ the deployment of the application software coupled
power world simulator to mitigate congestion. with high robustness indicates the relevance and
The purpose of the OPF is to minimize the cost operational level of multi-agent system based ap-
function by changing system controls and tak- plication software development. User can use the
ing into account both equality and inequality software for any size of power system by defining
constraints which are used to model the power the number of agents in the system and inserting
balance constraints and various operating limits. the associated information.
It functionally combines the power flow with The Figure 19 shows a demonstration of devel-
economic dispatch. In power world simulator, oped software simulation. The Remote Monitoring
the optimal solution is being determined using Agent (RMA) console can be run in the JADE
linear programming. Once congestion has been runtime environment where developed agents in
mitigated, the new network schedule and relevant the frame work can be monitored and controlled.
network information will be extracted from the The other graphical tools such as dummy agent,
power world simulator. sniffer agent and introspector agent, which are
used to monitor, debug and control the MAS
programming, can be activated from RMA. In
RESULTS AND DISCUSSION the figure, the sniffer agent is activated and the
successful implementation of agents’ communica-
Development and implementation of multi-agent tion is observed.
application to restructured energy markets is pre-
Figure 20. Excess demand at the MCP
226
Figure 21. Excess supply at the MCP
Several different scenarios of double sided where supply and demand are matched at the MCP
bidding PoolCo market are simulated and the as illustrated in Figure 22.
scenarios are defined as follows: scenario 1 is Table 2 shows the above scenarios numeri-
defined for a case where excess demand is avail- cally. In scenario 1, at the market equilibrium, the
able at the MCP which is illustrated in Figure 20; bidding quantity of Load 1 is 10kW whereas the
scenario 2 is defined for a case where excess sup- successful market output is only 5kW. Therefore
ply is available at the MCP which is illustrated in additional 5kW power is necessary for Load 1.
Figure 21; and scenario 3 is defined for a case Here, the excess demand of 5kW is available at
Figure 22. Perfect supply and demand matching at the MCP
227
Table 2. Results of different scenarios
Agents Scenario 1 Scenario 2 Scenario 3

Input Output Input Output Input Output
P Q P Q P Q P Q P Q P Q
Pgen1 11 70 11 70 11 70 11 65 11 70 12 70
Pgen2 12 20 0 0 12 20 0 0 12 20 0 0
Pgen3 10 25 11 25 10 35 11 35 10 20 12 20
Load1 11 10 11 5 11 10 11 10 11 10 0 0
Load2 12 20 11 20 12 20 11 20 12 20 12 20
Load3 10 10 0 0 10 10 0 0 10 10 0 0
Load4 14 30 11 30 14 30 11 30 14 30 12 30
Load5 13 40 11 40 13 40 11 40 13 40 12 40
P- Price in cents and Q- Quantity in kW.
the market equilibrium. In scenario 2, at the mar- with agent oriented programming methodology to
ket equilibrium, the bidding quantity of Pgen 1 deliver customized and powerful application soft-
is 70kW whereas the successful market output is ware. The simulation of this software demonstrates
only 65kW. Therefore 5kW power is available at the successful development and implementation
Pgen 1. Here, the excess supply 5kW is available of the multi-agent framework, the feasibility and
at the market equilibrium. In scenario 3, at the effectiveness of a multi-agent platform to model
market equilibrium, the bidding quantity of Load the restructured energy markets, and the roles
2 is 20kW and the successful market output is of ISO and PX in particular for carrying out the
also 20kW. Here, the supply and the demand are market operations.
exactly matched at the market equilibrium. This application is a fully cross platform, FIPA
The agent platform (JADE) used in the software compliant software written in Java language. The
development is a FIPA compliant platform. In the application is made by various Java packages,
implementation of agent oriented designing, it giving future programmers to work with both
strictly follows the FIPA standards compliance to readymade pieces of functionality and abstract
ensure interoperability with future systems of interfaces of custom and application tasks. Fur-
FIPA standards as well. JADE is fully Java coded ther, the attractive features of Java, in particular
platform. It shows the complete cross platform its cross platform deployment, security policies
portability on all the systems when tested on and provisions for distributed computing through
UNIX, Linux, Windows 95, 98, 2000, XP and Remote Method Invocation (RMI) and sockets,
Vista machines. have benefited this software development.
CONCLUSION ACKNOWLEDGMENT
This chapter presents a multi-agent software The funding for this project was received from
development to simulate the ISO/PX operations SERC IEDS programme grant R-263-000-507-
for a restructured energy markets. This is done 306.
228
REFERENCES Logenthiran, T., Srinivasan, D., & Wong, D.

(2008). Multi-agent coordination for DER in
Bagnall, A. J., & Smith, G. D. (2005). A Multi agent MicroGrid. IEEE International Conference on
Model of UK Market in Electricity Generation. Sustainable Energy Technologies (pp. 77-82).
IEEE Transactions on Evolutionary Computation,
522–536. doi:10.1109/TEVC.2005.850264 Praça, I., Ramos, C., Vale, Z., & Cordeiro, M.
(2003). MASCEM: A Multi agent System That
Budhraja, V. S. (2001). California’s electricity Simulates Competitive Electricity Markets. IEEE
crisis. IEEE Power Engineering Society Summer International conference on Intelligent Systems
Meeting. (pp. 54-60).
Dimeas, A. L., & Hatziargyriou, N. D. (2007). Rahman, S., Pipattanasomporn, M., & Teklu,
Agent based control of Virtual Power Plants. Y. (2007). Intelligent Distributed Autonomous
International Conference on Intelligent Systems Power System (IDAPS). IEEE Power Engineer-
Applications to Power Systems. ing Society General Meeting.
Hatziargyriou, N. D., Dimeas, A., Tsikalakis, A. Shahidehpour, M., & Alomoush, M. (2001). Re-
G., Lopes, J. A. P., Kariniotakis, G., & Oyarzabal, structured Electrical Power Systems: Operation,
J. (2005). Management of Microgrids in Market Trading, and Volatility. Marcel Dekker Inc.
Environment. International Conference on Future
Power Systems. Shahidehpour, M., Yamin, H., & LI Z. (2002).
Market Operations in Electric Power Systems:
Jayantilal, A., Cheung, K. W., Shamsollahi, P., & Forecasting, Scheduling, and Risk Management.
Bresler, F. S. (2001). Market Based Regulation for Wiley-IEEE Press.
the PJM Electricity Market. IEEE International
Conference on Innovative Computing for Power Shrestha, G. B., Song, K., & Goel, L. K. (2000).
Electric Energy Meets the Markets (pp. 155-160). An Efficient Power Pool Simulator for the Study
of Competitive Power Market. Power Engineering
Koritarov, V. S. (2004). Real-World Market Rep- Society Winter Meeting.
resentation with Agents (pp. 39–46). IEEE Power
and Energy Magazine. Sueyoshi, T., & Tadiparthi, G. R. (2007). Agent-
based approach to handle business complexity
Krishna, V., & Ramesh, V. C. (1998). Intelligent in U.S. wholesale power trading. IEEE Transac-
agents for negotiations in market games. Part I. tions on Power Systems, 532–543. doi:10.1109/
Model. IEEE Transactions on Power Systems, TPWRS.2007.894856
1103–1108. doi:10.1109/59.709106
Krishna, V., & Ramesh, V. C. (1998a). Intelligent
agents for negotiations in market games. Part II. KEY TERMS AND DEFINITIONS
Application. IEEE Transactions on Power Sys-
tems, 1109–1114. doi:10.1109/59.709107 Multi-Agent System: A distributed network
of intelligent hardware and software agents that
Lasseter, R., Akhil, A., Marnay, C., Stephens, J.,
work together to achieve a global goal.
Dagle, J., Guttromson, R., et al. (2002, April).
Restructuring of Power System: Reform
White paper on Integration of consortium Energy
the vertically integrated utility monopoly power
Resources. The CERTS MicroGrid Concept.
system to transform distributed control power
CERTS, CA, Rep.LBNL-50829.
229
system which provide competition and open ac- and controllable loads and their combination is
cess to all user in the interconnection. referred to as Distributed energy resource.
PoolCo Market Model: One of the market Coordination Between Agents: In multi-
models in restructured power system. PoolCo is agent systems, an agent usually plays a role with
a centralized marketplace that clears the market cooperative or completive behaviours with other
for buyers and sellers according to the bids of agents. Therefore communication between agents
sellers and buyers. is necessary in a multi-agent system.
Rule-Based System: One of the ways to store Reliability of Power System: Concerns suf-
and manipulate knowledge to interpret informa- ficient generation and transmission resources are
tion in a useful way. available to meet projected demand and status of
Distributed Energy Resource: Distributed system after outages or equipment failures. Reli-
generation technology with distributed storage able power system operation must satisfy voltage
constraints and power flows within thermal limits.
230
Section 6
232
Chapter 13
Effects of Shaping a
Reward on Multiagent
Reinforcement Learning
Sachiyo Arai
Chiba University, Japan
ABSTRACT
The multiagent reinforcement learning approach is now widely applied to cause agents to behave ra-
tionally in a multiagent system. However, due to the complex interactions in a multiagent domain, it is
difficult to decide the each agent’s fair share of the reward for contributing to the goal achievement.
This chapter reviews a reward shaping problem that defines when and what amount of reward should
be given to agents. We employ keepaway soccer as a typical multiagent continuing task that requires
skilled collaboration between the agents. Shaping the reward structure for this domain is difficult for the
following reasons: i) a continuing task such as keepaway soccer has no explicit goal, and so it is hard
to determine when a reward should be given to the agents, ii) in such a multiagent cooperative task, it
is difficult to fairly share the reward for each agent’s contribution. Through experiments, we found that
reward shaping has a major effect on an agent’s behavior.
INTRODUCTION tries to maintain ball possession by avoiding the

opponent’s interceptions. The keepaway soccer
In reinforcement learning problems, agents take problem, originally suggested by Stone (2005),
sequential actions with the goal of maximizing provides a basis for discussing various issues of
a time-delayed reward. In this chapter, the de- multiagent systems and reinforcement learning
sign of reward shaping for a continuing task in problems(Stone, 2006). The difficulties of this
a multiagent domain is investigated. We use an problem are twofold, i.e., the state space is con-
interesting example, keepaway soccer (Kuhlmann, tinuous and the sense-act cycle is triggered by an
2003; Stone, 2002; Stone, 2006), in which a team event, such as a keeper (learner) getting the ball.
Since the learner selects a macro-action which
DOI: 10.4018/978-1-60566-898-7.ch013
Effects of Shaping a Reward on Multiagent Reinforcement Learning
requires a different time period, it is appropriate The rest of this chapter is organized as follows.
to model this problem as a semi-Markov decision In the next section, we describe the keepaway
process. soccer domain, and discuss its features from the
To our knowledge, designing the reward func- viewpoint of reinforcement learning. In Sec-
tion has been left out of reinforcement learning tion 3, we introduce the reinforcement learning
research, even though the reward function intro- algorithm we applied and our reward design for
duced by Stone (2005) is commonly used. How- keepaway. Section 4 shows our experimental
ever, designing the reward function is an important results, including the acquired behavior of the
problem (Ng, 2000). As an example, the following agents. In Section 5, we discuss the applicability
are difficulties of a designing reward measure for of our reward design on reinforcement learning
keepaway. First, it is a continuing task that has no tasks. We state our conclusion and future work
explicit goal to achieve. Second, it is a multiagent in Section 6.
cooperative task, in which there exists a reward
assignment problem to elicit desirable teamwork.
Because of these two features of keepaway, it is PROBLEM DOMAIN
hard to define the reward signal of each keeper
to increase the time of ball possession by a team. Keepaway Soccer
It should be noted that the reward for increasing
each keeper does not always lead to increased Keepaway (Stone, 2002) is known as a subtask
possession time by a team. of RoboCup soccer, and it provides a great basis
In the case of a continuing task, we can ex- for discussion on important issues of multiagent
amine a single-agent continuing task such as the systems. Keepaway consists of keepers who try
pole balancing task, in which one episode consists to keep possession of the ball, and takers who at-
of a period from the starting state to the failure tempt to take possession of the ball within a limited
state. If the task becomes a failure, a penalty is region. The episode terminates whenever takers
given, and this process can be used to evaluate take possession or the ball runs out of the region,
teamwork and individual skills. In contrast, in the and then players are reset for a new episode. When
case of a multiagent task, which includes both a takers keep the ball for more than four cycles of
teammate and at least one opponent, it is hard to simulation time, they are judged to have gained
tell who contributes to the task. In a multiagent ball possession successfully.
task such as keepaway, it is not always suitable Figure 1 shows the case of three keepers and
to assign positive rewards to agents according to two takers (3 vs. 2) playing in a region of size
the amount of time cycles of each agent. Appro- 20×20[m]. Here, keeper K1 currently has the ball,
priately assigning an individual reward for each K2 is the closest to K1, and K3 is the next closest,
agent will have a greater effect on cooperation and so on, up to Kn when n keepers exist in the
than sharing a common reward within the team. region. In a similar way, T1 is the closest taker to
But, if the individual reward is not appropriate, K, T2 is the next closest one, and so on, up to Tm,
the resulting performance will be worse than that when m takers exist in the region.
after sharing a common reward. Therefore, assign-
ing an individual reward to each agent can be a Macro-Actions
double-edged sword. Consequently, our focus is
on assigning a reward measure that does not have In the RoboCup soccer simulation, each player
a harmful effect on multiagent learning. executes a primitive action, such as a turn (angle),
dash (power) or kick (power, angle) every 100[ms].
233
However, it is difficult to employ these primitive

d ¢ = Quantize (exp (Quantize (log (d ), q ), 0.1))
actions when we take a reinforcement learning
approach to this domain, because the parameters (1)
of actions and state variables have continuous
Quantize (V ,Q ) = rint (V / Q )Q (2)
values that make the state space very huge and
complicated. To avoid the state representation
problem, macro-actions proposed by Stone1 Here, d’ and d are the quantized value and
are very helpful, and we employ the following the exact value of the distance, respectively. The
macro-actions. function rint(x) truncates an x number of decimal
places. Parameter q is set as 0.1 and 0.01 when an
HoldBall(): Remain stationary while keeping object is moving and when it is fixed, respectively.
possession of the ball in a position that is as far The noise parameter ranges from 1.0 to 10.0. For
away from the opponents as possible. example, when the distance between players is less
PassBall(): Kick the ball directly towards keeper k. than 10.0[m], the noise parameter is set as 1.0,
GetOpen(): Move to a position that is free from and it is set as 10.0 when the distance is 100.0[m],
opponents and open for a pass from the ball’s the most noisy case.
current position.
GoToBall(): Intercept a moving ball or move Takers’ Policy
directly towards a stationary ball.
BlockPass(k): Move to a position between the In keepaway soccer, the two types of takers’
keeper with the ball and keeper k. policies shown in Figure 2 have been generally
used to see the effects of the reward design. In the
Since each macro-action consists of some case of policy-(i), the takers select GoToBall() to
primitive actions, it requires more than one step interrupt the keeper’s HoldBall() whenever either
(100[ms/step]). Therefore, the keepaway task can taker is a certain distance from the ball. In the case
be modeled as a semi-Markov decision process of policy-(ii), the taker who is nearest to the ball
(SMDP). In addition, in the case of the RoboCup selects GoToBall(), and the other taker selects
soccer simulation, it assumed that noise affects BlockPass(k). Because each taker plays a distinct
the visual information during the keepaway task. role in policy-(ii) to intercept a pass, policy-(ii)
Considering the above features of the task model, is more strategic than policy-(i).
the distance to an object from a player is defined
in the following equations.
Figure 1. 3 vs. 2 keepaway task in 20 [m] × 20 [m]. (a) Object names; (b) Initial positions
234
Issues of Reward Shaping of a period from the starting state to the failure
state. In such a continuing task, an episode will
Figure 3 shows the task classification of testbeds always end with failure, and the penalty can help
for multiagent and reinforcement learning from the the design of a good reward measure to improve
viewpoints of designing a reward problem. As we both teamwork and individual skills. In contrast,
mentioned in the previous section, the difficulties from the aspect of a multiagent task that includes
of designing a reward are the lack of an explicit both teammates and opponents, as in keepaway, it
goal and the number of agents involved in a task. is hard to tell who contributed to keeping posses-
First, in a continuing task there is no explicit sion of the ball within the team. In other words,
goal to achieve, so the designer cannot tell when we should consider the case where some keepers
the reward should be given to the agent/s. Second, contribute and others may not, or an opponent
in a multiagent cooperative task, there exists a (taker) contributes by taking a ball. What has
reward assignment problem of deciding the to be noted is that the episode of a multiagent’s
amount of reward be allotted to each agent for continuing task end with someone’s failure. This
achieving desirable teamwork. The keepaway task problem has been discussed as a credit assignment
contrasts with a pursuit game, which is a tradi- in time-extended single-agent task and multiagent
tional multiagent research testbed that has an task domains (Agogino, 2004).
explicit common goal. Because the pursuit game Though the credit assignment issue is closely
is an episodic task, the reward just has to be related to our research here, we design a reward
given when hunters achieve their goal. In addition, function to evaluate the “last” state-action pair of
it is easier to assign a hunter’s (learner’s) reward each agent in the SMDP (semi Markov Decision
than the reward of a keeper (learner), because all Process) domain, where each agent’s action takes a
four hunters definitely contribute to capture the different length of time, instead of assigning each
prey. Therefore, keepaway is classified as a state-action pair of each agent’s whole state-action
harder task in terms of designing a reward because sequence. Here, we consider the reward design
we have no clues to define an explicit goal be- problem that consists of setting the amount of
forehand and to assign a reward to each agent. reward value and time of reward assignment so
From the aspect of a continuing task, we can that we can optimize design issues of the reward
refer to the case of single-agent continuing tasks, measure in the multiagent learning process.
e.g., pole balancing, in which one episode consists
Figure 2. Takers’ policy: 3 vs. 2 keepaway task in a 20 [m] × 20 [m] region. (i) Always GoToBall; (ii)
GOTOBall and BlockPass
235
COMPONENTS OF 10. select action at+1 using

REINFORCEMENT LEARNING Qa
11. for all i
Learning Algorithm 12.
δ = rt +1 + γ Q (st +1, at +1 }) - Q(st , at )
Sarsa(l) with replacing eligibility trace (Singh, 13. θ(i, a ) = θ (i, a ) e(i, a ) δ, for
1996) has been generally employed as the mul-
all a
tiagent learning algorithm because of its robustness
14. e(i, a ) = l e(i, a ) , for all
within non-MDPs. Here, we take Sarsa(l) with
replacing eligibility trace as the learning algorithm a
of a keeper by following Stone’s approach (Stone, 15. e(i, a ) = Ft +1(i ), if a =
2005). The list below shows the learning algorithm at+1
that we use. 16. e(i, a ) = 0, if a ≠ at +1
17 st ← st +1
Initialize q (i, a ), e (i, a ) 18. if episode ends, go to line
2
1. each episode:
2. each SMDP step: State Representation: Tile-coding (Arai,
3. keeper gets st from en- 2006): For the state representation, we introduce
vironment a revised version of tile-coding (Sutton, 1998)
4. make up a feature vector as function approximations for continuous state
Ft (i ) from st variables, as shown in Figure 4. A linear tile-coding
N1
5. Qa = ∑ i =0
θ (i, a ) Ft (i ) for function approximation that have been introduced
all a to avoid a state explosion within many reinforce-
6. select action at using ment learning applications. It allows to divide the
Qa continuous state space by using the arbitral sizes
7. senses st+1, rt+1 of tiling, and be able to generalize the similar
8. make up a feature vector situations by using the multiple overlapping til-
Ft +1 (i ) from st+1 ings. The number of tiles and each size of it are
N1 previously defined by the designer. This definition
9. Qa = ∑ i =0
θ (i, a ) Ft +1 (i ) for affects learning performance materially.
all a
Figure 3. Classification of testbeds
236
Figure 4. Function approximations for the state representation
In Figure 4, from lines 4 to 8, feature vector the simulation time when the keeper holds the ball
Ft and Ft +1 are made by tile-coding. In our ex- or the episode ends, and LastActionTime is the
periments, we use primarily single-dimensional simulation time when the keeper selects the last
tilings: 32 tilings are overlaid with a 1/32 offset, action. We hereafter refer to function Equation (4)
and each tiling is divided into ntile segments. Each as rs. In this approach, a reward is defined as the
segment is called a tile and each state variable amount of time between the last action and the
lies in a certain tile, which is called an active tile. current time (or end time). That is, as the amount
In keepaway, the distances between players and of time increases after taking the ball, the amount
angles are represented as state variables. In the of reward given to the keeper also increases.
case of 3 vs. 2, the total number of state variables
is 13, which consists of 11 for distance variables rs = CurrentTime - LastActionTime (4)
and 2 for angle variables. In our experiments, ntile
= 10 for each of the 11 distance variables, and ntile This approach seems reasonable and proper.
=18 for each of the 2 angle variables. However, there are some problematic cases by
Accordingly, the number of total tiles Ntile is using rs, as shown in Figure 5, for example. In
4672. Each value of the state variable i is repre- Figure 5(b), K1, who is the current ball holder,
SN F(i ) = 1 gets a larger reward than the one in Figure 5(a).
sented as a feature vector, F(i ) , i =tile
and. Each value of i is shown as follows: Consequently, the keeper passes the ball to the
intercepting takers on purpose, because the ball
 ntile will bounce back directly to K1, and then K1 is paid
 (i th tile is active) some reward for selecting this action. This action
i =  N tile (3)
 seems to yield a larger reward than other actions.
 0 (otherwise) K1 in Figure 5(d) gets a larger reward than the
one in Figure 5(c) when the reward is defined by
rs. Consequently, the keeper is likely to pass to
Previous Work in Reward Design the teammate (keeper) who is in the farthest posi-
tion to get a larger reward than pass to the nearer
For the reward r appearing in line 7 of Figure 2, one. Although it seems reasonable, we cannot tell
rs defined by Equation (4) (Stone, 2005) is com- which one is the better strategy because it depends
monly used for keepaway. Here, CurrentTime is on the amount of noise and the keepers’ skill.
237
Figure 5. Problematic situations. (a) K1 takes HoldBall(); (b) K1 takes PassBall(); (c) K1->K2->K1; (d)
K1->K3->K1
These examples show the difficulty of design- programmed reward for each step, as shown in
ing a reward function for a continuing task, as Figure 6, it is usually difficult to say whether these
previously mentioned. values become an appropriate indicator to keep
a successful situation. Therefore, we introduce a
Reward Design for Collective novel reward function based on a constant reward
Responsibility sequence (Figure 6) to reduce the harmful effects
of the reward design on emerging behavior.
In the well-known continuing task, pole balancing,
the amount of reward is defined by Equation (5).  -1 (under a failure condition)
The agent receives either 0 or -1 for a successful r =  (5)
 0 (otherwise)
and failure condition, respectively. We follow the 
scheme that the reward makes an agent do what
it takes to achieve, not how it achieves (Sutton, The major difference between the domain of
1998). From this standpoint, the reward value pole balancing and keepaway is the number of
has to be constant during a successful situation agents involved. Unlike the single-agent case, in
such as pole balancing, because we do not know which one agent is responsible for the failure or
which action achieves the task. While rs by Equa- success of the task, responsibility is diffused in
tion (4) (Stone, 2005) provides a differentially the multiagent case. However, in the keepaway
Figure 6. Reward sequence in a continuing task
238
task, specifying which agent causes failure seems j

e x p e r i m e n t s , w e u s e f (t j ) = -b t and
much easier than specifying which one contributes
f (t ) = -1 / (t ) as reward functions. For sim-
j j
success. Therefore, we design reward functions

plicity, we refer to f (t j ) = -1 / (t j ) as rf in the
of agent j, as shown in Equation (6), where tj is
following sections.
given by TaskEndTime -LastActionTime (K j )
when the task fails. Otherwise, the agent receives  f (t j ) (under failure condition)
0 constantly during a successful situation. The r (t j ) =  (6)
 0 (otherwise)
reason for sharing the same reward (= 0) among 
the agents in the successful situation is that we t = TaskEndTime - LastActionTime(K j )
j
cannot tell which agent contributes to the task

solely from the length of keeping time. Also, we
know that if one agent keeps a task longer lo- ExPERIMENT
cally, good teamwork does not always result.
In our design, each agent receives the reward In this section, we show the empirical results in
f (t j ) according to its amount of time, i.e., from the keepaway domain. The learning algorithm is
taking the ball ( LastActiontime ) to the end of shown in Figure 4, and the parameters are set as
the task (TaskEndTime ) at the end of each epi- a = 0.125 , g = 0.95 , l = 0 , and e = 0.01
sode. We make function f (t j ) fulfill f (t j ) £ 0 for e -greedy. For the noise parameter q, mentioned
to reflect the degree of success for the agents’ in Section 2.2, we set q as 10-5, which is the same
joint action sequences. Figure 7 shows some setting used by Stone (Stone, 2005) to represent
examples of function f (t j ) by which the keeper the behavior of a noise-free environment.
Figure 8 shows the learning curves of the five
who terminates the episode receives the largest
different reward functions introduced in Figure 7.
penalty (i.e., the smallest reward). Here, the x and
Here, the x- and y-axes indicate the length of train-
y-axes indicate the length of tj defined by Equation
ing time and length of keeping time, respectively.
(6) and the value of f (t j ) , respectively. In our
We plot the moving average of 100 episodes.
Figure 7. Reward function under different b
239
Figure 8. Learning curve under the various reward functions (moving average of 100 episodes) against
takers’ policy (i)
Performances keepers who experienced 25 hours of keepaway

training. In Figure 7, keepers who learned by
In the case of f (t j ) = 1 , which is the same reward rs possess the ball for up to approximately 100
function given in the pole balancing problem, the seconds, while keepers who learned by rf possess
performance declines in the early learning stage, the ball for more than 300 seconds.
as shown in Figure 8. The major reason for the
decline is multiagent learning problems, such as Emerging Behavior
reward assignment and simultaneous learning.
By giving -1 to all keepers under failure con- Effect of Reward Design
ditions, keepers selecting both appropriate and
inappropriate actions equally receive the penalty We compare the acquired behaviors of both cases,
of -1. This causes a harmful effect on learning, rs and rf after 25 hours of training. One of the
especially in the initial stage. However, after notable differences between the two cases is the
considerable learning, all keepers learned better timing of the pass. Figure 11(a) and (b) show the
actions. We find that the cases of b = 0.1 and behaviors of K1 reinforced by rs and rf, respectively.
b = 1.0 have similar curves in Figure 8, though The keeper reinforced with rf shown in (b) does
each value of f (t j ) (Figure 7) is totally different. not pass the ball until the takers become quite
The likely explanation for this similarity lies in close to him. Whereas, the keeper with rs seems
to pass regardless of the takers’ location. To ex-
the domain of f (t j ) . When TaskEndtimet j > 1,
amine the difference of pass timing between (a)
the case of b = 0.1 reaches 0, and the case of
and (b), we compare the distance from the keeper
b = 1.0 reaches 1, as shown in Figure 7. This
having the ball to the nearest taker of each reward
indicates that all keepers receive the same reward
function. The distance in the case when using rf is
value, and the reward assignment has harmful
approximately 2 [m] shorter than that in the case
effects in both cases.
using rs, as shown in Table 1. The behavior (i.e.,
Figure 9 shows the learning curves for two
keeper holds the ball until takers becomes quite
cases with different reward functions. One is rs
close to the keeper) means that keepers often select
and the other is rf=-1/T. Because rf shows the
HoldBall(). Then, we focused on the change of the
best performance of the keepers, we focus on this
pass frequency within 1 seconds halfway through
reward function, which is based on the time of
the learning process, as shown in Figure 12. Table
failure. Figure 7 shows the episode duration of
240
Figure 9. Learning curves with the best-performance function and the existing function (moving average
of 1000 episodes) against takers’ policy (i)
2 shows the pass frequency after learning. Here, and K3 are always free from these takers. Thus,
we find that the frequency of passing a ball was to examine the availability of our reward func-
smaller for rf than for rs. tion, rf, in the different situation where the takers
have a more complicated policy, we apply our
Effect of Takers’ Policy reward function to the situation where takers act
with policy-(ii). Figure 13 shows the comparison
Here, we discuss the emerged behavior from the between the two learning curves with our keepers
viewpoint of the takers’ policies introduced in using rf and rs.
Figure 2(i) and (ii). It seems that the emerged When the takers act with policy-(ii), the per-
behavior shown in Figure 11(a) is especially ef- formance of keepers with rs is worse than for
fective against the takers with policy-(i). Because takers with policy-(i). As mentioned in Section
both takers always select GoToBall(), keeper K2 3.2, reward function rf has adverse effects on
Table 1. Pass timing: distance to the nearest taker from the keeper with the ball after 25 hours
Distance [m]
Taker’s policy-(i) Taker’s policy-(ii)
rs 5.60 3.50
rf 7.44 7.38
Table 2. Pass frequency: (number of passes during 1 second)
Frequency [times/second]
Reward function Takers’ policy-(i) Takers’ policy-(ii)
rf 1.4 ± 0.1 2.1 ± 0.2
rs 2.3 ± 0.3 2.5 ± 0.2
241
Figure 10. Results after 25 hours learning, 0-180 episode duration
Figure 11. Pass timing. (a) rs: Equation(4); (b) rf = -1/tj
Figure 12. Pass frequency halfway through learning
learning. Also, when the takers act with policy- hours of training against takers acting with poli-
(ii), rf makes keepers possess the ball longer than cy-(ii). We found that keepers reinforced by rf
function rs does. Figure 14 shows the results of could possess the ball at least twice as long as
the episode duration after keepers experienced 25 keepers reinforced by rs. However, episode dura-
242
Figure 13. Learning curves with the best reward function and the existing function (moving average of
1000 episodes):against takers’ policy (ii)
Figure 14. Results after 25 hours learning, 0-350 episode duration
tion becomes shorter than that against takers with DISCUSSION

policy-(i), shown in Figure 2(i), because of the
sophisticated takers’ policy. There is not large This section presents some of the problems in de-
difference of pass frequencies between the cases signing the function for the keepaway task through
of rf and rs because the takers play distinct roles experiments. Since keepaway is a continuing and
by using policy-(ii); in this case, one of the takers multiagent task, it is hard to decide when and what
can immediately reach the keeper who receives amount of reward should be given to the learners.
the ball, and so the keeper with the ball must pass From the aspect of a continuing task, we refer
the ball as soon as possible. Therefore, the pass to the case of the single-agent continuing task,
frequency increases. As for the emerged behavior, pole balancing, where failure terminates the task.
we find the same tendency (i.e., keepers do not However, because keepaway is a multiagent task,
pass the ball until the takers become closer to K1) simultaneous learning problems occur in the early
for both takers’ policy-(ii) and for takers’ policy- learning stage and we must consider getting high-
(i), as shown. in Table 1. performance cooperative behavior. The amount
243
of reward that is given to the agent is defined by keeping time. Since it is not always good for a
Equation (5). The agent receives -1 or 0 for the team when one agent keeps a task longer locally,
failure or success condition, respectively. we do not introduce the predefined value of each
The difference between pole balancing and agent’s reward individually. In our design, each
keepaway is the number of agents involved. Un- agent receives the reward f(tj) according to its
like the single-agent case, in which one agent is amount of time from taking the ball (LastAction-
responsible for the failure or success of the task, time) to the end of the task (TaskEndTime) at the
responsibility is diffused in the multiagent case. end of each episode. For the introduced reward
However, in the keepaway task, specifying the functions, rf =-1/ t j provides relatively better
agent causing a failure seems much easier than performance than that of the other functions.
specifying the agent contributing to the success. j
Though function f (t j ) = 0.7t has a similar curve
Therefore, we design reward functions for the to rf when t j < 20, as shown in Figure 7, it doesn’t
keepaway task so that a keeper who terminates perform as well as the case of rf. The main reason
the episode receives a larger penalty (i.e., smaller for this result is due to the value range of T. The
reward). Table 3 shows the comparison among range of T is always larger than 20, and so the
three cases of keepaway (hand-coded, and two similarity in T< 20 does not have much effect on
reward functions). Though the empirical results the performance.
show that keepers using our function can possess Second, the keeper with rf passes the ball more
the ball for approximately three times longer frequently than the keepers with rs in the earlier
than those hand-coded and learned by the other stage of learning, as shown Figure 12. Because
reward function (Stone, 2005), the reason for the keeper with rs can receive some reward when
this high performance has not been qualitatively selecting HoldBall in the early stage, the keeper
analyzed yet. tends not to pass the ball so many times to the
First, we discuss the reward functions that we other keeper. Meanwhile, our keepers reinforced
introduced. We introduce T=TaskEndTime-Last- by rf do not receive any penalty when they pass
ActionTime and give -1/T to the agent when the the ball; that is, they receive a penalty only when
task fails. Otherwise, the agent receives 0 con- they are intercepted or miss the pass. So, our
stantly during a successful situation, as mentioned keepers are not afraid to pass to other keepers.
in Section 3.4. The reason for sharing the same In the middle and late learning stages, the keep-
reward (= 0) among agents in the successful situ- ers with rs pass the ball frequently because they
ation is that we cannot identify which agent experience a larger reward using PassBall(k) than
contributes to the task solely by the length of
Table 3. Comparison of average possession times (in simulator seconds) for hand-coded and learned
policies against two types of takers in region 20 [m] Ã × 20 [m].
Keep Time [seconds] ( ±1s )
Reward function Takers’ policy-(i) Takers’ policy-(ii)
rf 35.5 ± 1.9 14.0 ± 2.3
rs 14.2 ± 0.6 7.0 ± 0.6
Hand-coded 8.3 ± 4.7 -
244
using HoldBall. However, the pass frequency of vious studies of keepaway (Stone 2006), we are
our keepers decreases because they experience not yet able to provide a theoretical analysis of
having their ball intercepted or missing the pass the results. At present, we have been examining
after considerable training. the problem peculiar to a continuing task that
Third, as we described in Section 2.2, the vi- terminates at failure, such as a pole balancing, and
sual information contains some noise. The passed a reward assignment within a multiagent task in
ball often fails to reach the intended destination which simultaneous learning takes place. For the
because of the noise, and so the noise has a large continuing task case, we show that a certain penalty
effect on the emerging behavior. Since the action causes an agent to learn successfully. Whereas, for
of passing carries some probability of missing or the multiagent case, we avoid the harmful effect
being intercepted, the frequency of the pass of of the agents’ simultaneous learning by parameter
our keepers learned with rf becomes small. This tuning. It is necessary to analyze the breakdown
is considered reasonable and proper behavior in a of the reward, such as which agent gets a greater
noisy environment and against takers’ policy-(i), penalty, and which gets a lesser penalty, when
shown in Figure 2(i). designing a multiagent system.
Fourth, we look at the effects of the takers’
policy. We found in Figure 13 that our keepers
with rf against takers’ policy-(ii) (Figure 2(ii)) REFERENCES
possess the ball less than in the case against tak-
ers’ policy-(i) (Figure 2(i)). It seems that, as the Agogino, A. K., & Tumer, K. (2004). Unifying
frequency of the pass increases, the duration of Temporal and Structural Credit Assignment Prob-
the episode decreases. We found in Table 2 that lems. In Proceedings of the Third International
the frequency of the pass becomes smaller when Joint Conference on Autonomous Agents and
our keepers learn about takers’ policy-(ii) in Multi-Agent Systems (pp. 980-987).
comparison with takers’ policy-(i). Arai, S. & Tanaka, N. (2006). Experimental
Last, we discuss the macro-actions we currently Analysis of Reward Design for Continuing Task
use. As the pass frequency increases, keepers in Multiagent Domains. Journal of Japanese
do not have enough time to move to a position Society for Artificial Intelligence, in Japanese,
that is free from the opponent and cannot clear a 13(5), 537-546.
path to let the ball pass from its current position.
Consequently, the probability of missing a pass Kuhlmann, G., & Stone, P. (2003). Progress in
seems to increase. This problem might be resolved learning 3 vs. 2 keepaway. In Proceedings of the
by introducing more sophisticated macro-actions RoboCup-2003 Symposium.
such as GetOpen(), and so forth.
Ng, A. Y. Ng & Russell, S. (2000). Algorithms for
Inverse Reinforcement Learning. In Proceedings
of 17th International Conference on Machine
CONCLUSION Learning (pp. 663-670). Morgan Kaufmann, San
Francisco, CA.
In this chapter, we discuss the issue of the revised
version of tile-coding as state representation and Singh, S. P., & Sutton, R. S. (1996). Reinforcement
reward design for multiagent continuing tasks, Learning with Replacing Eligibility Traces. Ma-
and introduce an effective reward function for chine Learning, 22(1-3), 123–158. doi:10.1007/
the keepaway domain. Though our experimental BF00114726
results show better performance than that of pre-
245
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. Grzes, M., & Kudenko, D. (2008). Multigrid
(2006). Keepaway Soccer: From Machine Learn- Reinforcement Learning with Reward Shaping.
ing Testbed to Benchmark . In Noda, I., Jacoff, ( . LNCS, 5163, 357–366.
A., Bredenfeld, A., & Takahashi, Y. (Eds.), Robo-
Konidaris, G., & Barto, A. (2006). Autonomous
Cup-2005: Robot Soccer World Cup IX. Berlin:
shaping: Knowledge transfer in reinforcement
Springer Verlag. doi:10.1007/11780519_9
learning. Proceedings of the 23rd international
Stone, P., & Sutton, R. S. (2002). Keepaway conference on Machine learning (pp. 489-496).
Soccer: a machine learning testbed . In Birk,
Marthi, B. (2007). Automatic shaping and decom-
A., Coradeschi, S., & Tadokoro, S. (Eds.), Ro-
position of reward functions. In Proceedings of
boCup-2001: Robot Soccer World Cup V (pp.
the 24th International Conference on Machine
214–223). doi:10.1007/3-540-45603-1_22
Learning (pp. 601–608).
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005).
Mataric, M. J. (1994). Reward functions for ac-
Reinforcement Learning for RoboCup Soccer
celerated learning. In Proceedings of the 11th
Keepaway. Adaptive Behavior, 13(3), 165–188.
International Conference on Machine Learning
doi:10.1177/105971230501300301
(pp. 181-189).
Sutton, R., & Barto, A. G. (1998). Reinforcement
Ng, A., Harada, D., & Russell, S. (1999). Policy
Learning: An Introduction. Cambridge, MA:
invariance under reward transformations: Theory
MIT Press.
and application to reward shaping. In Proceedings
of the 16th International Conference on Machine
Learning (pp. 278-287).
ADDITIONAL READING
Taniguchi, T. & Sawaragi, T. (2006). Construction
Agogino, A. K., & Tumer, K. (2004). Efficient of Behavioral Concepts through Social Interac-
evaluation functions for multi-rover systems. In tions based on Reward Design: Schema-Based
Proceedings of the Genetic and Evolutionary Com- Incremental Reinforcement Learning. Journal of
putation Conference (GECCO-2004) (pp. 1-12). Japan Society for Fuzzy Theory and Intelligent
Informatics, 18(4), (in Japanese), 629-640.
Agogino, A. K., & Tumer, K. (2005). Multi-agent
reward analysis for learning in noisy domains. Tumer, K., & Agogino, A. K. (2006). Efficient
In Proceedings of the fourth international joint Reward Functions for Adaptive Multi-Rover
conference on Autonomous agents and multiagent Systems Learning and Adaptation in Multi Agent
systems (pp. 81-88). Systems (pp. 177–191). LNAI.
Agogino, A. K., & Tumer, K. (2008). Efficient Wolpert, D. H., & Tumer, K. (2001). Optimal
Evaluation Functions for Evolving Coordina- payoff functions for members of collectives.
tion. Evolutionary Computation, 16(2), 257–288. Advances in Complex Systems, 4(2/3), 265–279.
doi:10.1162/evco.2008.16.2.257 doi:10.1142/S0219525901000188
Erez, T., & Smart, W. D. (2008). What does shaping Wolpert, D. H., Tumer, K., & Bandari, E. (2004).
mean for computational reinforcement learning? Improving search algorithms by using intelligent
In Proceedings of 7th IEEE International Confer- coordinates. Physical Review E: Statistical,
ence on Developing and Learning (pp. 215-219). Nonlinear, and Soft Matter Physics, 69, 017701.
doi:10.1103/PhysRevE.69.017701
246
KEY TERMS AND DEFINITIONS take possession of the ball within a limited region.
The episode terminates whenever takers take pos-
Reward Shaping: A technique to make re- session or the ball runs out of the region, and then
inforcement learning agent converge to the suc- players are reset for a new episode.
cessful policy for rational behavior.
Continuing Task: Has no explicit goal to
achieve, but task requires agent to keep the desir- ENDNOTE
able state(s) as long as possible.
RoboCup Soccer: See http://www.robocup. 1
Learning To Play Keepaway: http://www.
org/ cs.utexas.edu/users/AustinVilla/sim/keep-
Keepaway: Consists of keepers who try to keep away/
possession of the ball, and takers who attempt to
247
248
Chapter 14
Swarm Intelligence Based
Reputation Model for Open
Multi Agent Systems
Saba Mahmood
School of Electrical Engineering and Computer Science (NUST-SEECS), Pakistan
Azzam ul Asar
Department of Electrical and Electronics Eng NWFP University of Engineering and Technology,
Pakistan
Hiroki Suguri
Miyagi University, Japan
Hafiz Farooq Ahmad

School of Electrical Engineering and Computer Science (NUST-SEECS), Pakistan
ABSTRACT
In open multiagent systems, individual components act in an autonomous and uncertain manner, thus
making it difficult for the participating agents to interact with one another in a reliable environment.
Trust models have been devised that can create level of certainty for the interacting agents. However,
trust requires reputation information that basically incorporates an agent’s former behaviour. There
are two aspects of a reputation model i.e. reputation creation and its distribution. Dissemination of this
reputation information in highly dynamic environment is an issue and needs attention for a better ap-
proach. We have proposed a swarm intelligence based mechanism whose self-organizing behaviour not
only provides an efficient way of reputation distribution but also involves various sources of information
to compute the reputation value of the participating agents. We have evaluated our system with the help
of a simulation showing utility gain of agents utilizing swarm based reputation system. We have utilized
an ant net simulator to compute results for the reputation model. The ant simulator is written in c# and
utilizes dot net charting capabilities to graphically represent the results.
DOI: 10.4018/978-1-60566-898-7.ch014
Swarm Intelligence Based Reputation Model for Open Multi Agent Systems
INTRODUCTION about one entity’s former behaviour as experienced

by others while Trust is the measure of willingness
Interactions in human Societies to proceed with action (decision) which places par-
ties at risk of harm and based on an assessment of
Agent based systems share a number of charac- the risks, rewards and reputation associated with
teristics with human societies in terms of interac- all the parties in involved in the given situation.
tions, communications and various other factors. Several computational and empirical models
Human beings create a perception about another have been suggested in recent years trying to ad-
human based upon several factors. For example dress various issues of open multiagent system.
if someone bought a product from a seller and Earlier work involved model and mechanism
that product proves to be good enough on various developed for centralized multiagent systems.
parameters, the buyer would rate that seller as good However, with the evolution of distributed com-
as compared to any other available seller of the puting of decentralized nature, those models
same product. So, in future shopping the buyer will proved to be incomplete in addressing certain
keep this rating in mind before making a decision issues. Models like REGRET and FIRE take
from whom to buy and not to buy. But if a buyer in to account various sources of information to
is to experience for the first time the interaction compute the final reputation value of the agent
with certain seller and has no prior knowledge, under observation designed specifically to address
then knowledge of peers can be utilized in this issues of open MAS.
scenario to rate a particular seller. For example,
if Mr. X bought a product from a seller and was Trust
satisfied, another buyer who has no knowledge
about the product of the seller can use this infor- Trust is a fundamental concern in open distributed
mation. Thus human beings are using notion of systems. Trust forecast the outcome of interaction
trust and reputation of other humans with whom among the agents in the system. There are basically
they want to interact. two approaches to trust in multiagent systems;
firstly trust requires that the agents should be
Trust in Computer Systems endowed by some knowledge in order to calculate
the trust value of the interacting agent. A high
Multiagent systems (MAS) are composed of in- degree of trust would mean most probable selec-
dividual agents working towards a certain goal. tion of that agent for interaction purposes. Second
These agents need to interact with one another in approach to trust revolves around the design of
order to achieve the goal. However in open systems protocols and mechanisms of interaction i.e. the
it’s very difficult to predict the behaviour of the rules of encounter. These interaction mechanisms
agents. Open systems are characterized by high need to be devised to ensure that those involved
degree of dynamism. Thus interactions among can be sure they will gain some utility if they
the agents require some degree of certainty. The rightly deserve it and malicious agent cannot
notion of Trust is used in recent years in the field tamper with the correct payoff allocation of the
of computer systems to predict the behaviour of mechanism(Schlosser, Voss, Bruckner 2004). This
the agents based upon certain factors. Another definition of reputation and trust above depicts
term Reputation is also being used and sometimes that for trust management, reputation is required.
both trust and reputation are used interchangeably Reputation creation and distribution is an issue
but they do differ from one another. Reputation is especially in case of open multiagent systems.
defined as a collected and processed information
249
Thus we can say that the two approaches to trust to the feedback provided by the other par-
come under reputation creation and distribution. ties, which reflect their trustworthiness in
Trust has evolved as the most recent area the latest transaction.
in the domain of Information systems. A wide 4. Users with very high reputation values ex-
variety of trust and reputation models have been perience much smaller rating changes after
developed in the past few years. Basically we have each update.
divided the models in to two areas, centralized 5. Ratings must be discounted over time so that
and decentralized. the most recent ratings have more weight in
the evaluation of a user’s reputation.
Centralized Reputation Mechanism
SPORAS provide a more sophisticated model
Online electronic communities manage reputa- as compared to previous model. However, it is
tion of all the users in a centralized manner, for designed for a centralized system that is unable
example eBay and SPORAS. to address the issues related with open MAS.
eBay Model Decentralized Systems
It is implemented as a centralized system where In decentralized systems, each agent can carry out
users can rate the interactions of other agents in trust evaluation itself without a central authority.
the past and also leave some textual comments Following section gives details of some of the
about their behaviour. For example in eBay, an decentralized models.
interaction a user can rate its partner on the scale
–1, 0 or +1 that means positive neutral or nega- jurca and Faltings
tive ratings respectively. These ratings are stored
centrally and the reputation value is computed as Jurca and Faltings introduce a reputation system
the sum of those ratings over six months(Huynh, (Jurca & Faltings 2003) where agents are incen-
Jennings and Shadbolt 2006). tivised to report truthfully about their interactions
results. They define a set of broker agents called
SPORAS R agents whose tasks are buying and aggregating
reports from other agents and selling back reputa-
SPORAS (Maes & Zacharia, 2000) extends the tion information to them when they need it. All
online reputation systems by introducing a new reports about an agent are simply aggregated using
method for rating aggregation. Specifically instead the averaging method to produce the reputation
of storing all the ratings, each time a rating is value for that agent. Though the agents are dis-
received it updates the reputation of the involved tributed in the system, each of them collects and
party using the following algorithm: aggregate reputation reports centrally.
1. New users start with a minimum reputation TRAVOS

value and they build up reputation during
their activity on the system. TRAVOS(Huynh, Jennings and Ramchurn 2004)
2. The reputation value of a user never falls is a model of trust and reputation that can be used to
below the reputation of a new user. support informed decision making to assure good
3. After each transaction the reputation values interactions in a Grid environment that supports
of the involved users are updated according virtual organizations. If a group of agents are to
250
form a Virtual Organization, then it’s important satisfaction is calculated in terms of Utility Gain
for them to choose the most appropriate partner. (UG). UG for bad and intermittent providers is less
This model is built upon the probability theory. then the good and average ones. The model has
TRAVOS equips an agent (the trustier) with three incorporated the factor of dynamism in order to
methods for assessing the trustworthiness of an- address the changing environment of open MAS.
other agent (the trustee). The dynamism in FIRE is based on population of
First, the trustier can make the assessment agents that cannot exceed a particular threshold
based on the direct interactions it had with the and location of agents with an assumption that
trustee. Second, the trustier can assess the trust- some agents cannot change their location.
worthiness of the trustee based on the opinions
provided by others in the system. Third, the trustier
can assess the trustworthiness of another based on SWARM INTELLIGENCE
a combination of the direct interactions with and
the reputation of the trustee. TRAVOS considers Swarm intelligence (SI) is an artificial intelligence
the behaviour of an agent as a probability that it technique based around the study of collective
will participate in a successful interaction and behaviour in decentralised, self-organised, sys-
a probability that it will perform an unsuccess- tems. The expression “swarm intelligence” was
ful interaction (untrustworthy behaviour). This introduced by Beni & Wang in 1989(Beni, Wang
abstraction of agent behaviour means that in this 1989), in the context of cellular robotic systems.
model the outcome of an interaction is a binary SI systems are typically made up of population
value (successful or not). of agents interacting locally with one another and
the environment. There is no central authority to
The FIRE Model dictate the behaviour of the agents. In fact, the
local interactions among agents lead to emergence
The FIRE model uses wide variety of sources to of a global behaviour. Examples of systems like
compute the reputation value of an agent. These this can be found in nature, including ant colonies,
sources include IR, RR, WR and CR. IR is based bird flocking, animal herding, bacteria molding
on an agent’s personal experience, while WR and fish schooling.
and CR are reputation information reported by When insects work together collaboratively
neighboring agents. RR is a role-based reputa- there is no apparent communciation between
tion and involves some rule-based evaluation them. Infact, they utilize envoirnment as carrier
of an agent’s repute. The model has considered of information. They make certain changes in
a scenario of producers and consumers with an the envoirnement that are sensed by the insects .
assumption of only one type of service availabil- There are two popular techniques in Swarm
ity. The producers (Providers) are categorized as intelligence, i.e Ant Colony Optimization (ACO)
good, ordinary, intermittent and bad, depending and Particle Swarm Optimization (PSO). Ant
upon the quality of service they are rendering. colony algortihm mimic the behvior of simple
The Intermittent and Bad providers are the most ants trying to locate their food source. Ants while
random ones. If consumer agent needs to use the following a certain path lay down a special chemi-
service, it can contact the environment to locate cal called pheromone, other ants follow the same
nearby provider agents. The consumer agent will path by sensing the pheromone concentration.
then select one provider from the list to use its Stigmergy is altering the state of environment
services. The selection process depends upon in a way that it will effect the behavior of others
the reputation model of the agent. Consumer’s for whom environment acts as a stimulus. “Ant
251
Colony Optimization” is based on the observation associated with the best solution (fitness) it has
that ants will find the shortest path around an ob- achieved so far. (The fitness value is also stored.)
stacle separating their nest from a target such as a This value is called pbest. Another “best” value
piece of candy simmering on a summer sidewalk. that is tracked by the particle swarm optimizer
As ants move around they leave pheromone is the best value, obtained so far by any particle
trails, which dissipate over time and distance. in the neighbors of the particle. This location is
The pheromone intensity at a spot, that is, the called lbest. When a particle takes all the popula-
number of pheromone molecules that a wander- tion as its topological neighbors, the best value
ing ant might encounter, is higher either when is a global best and is called gbest(Covaci 1999).
ants have passed over the spot more recently or
when a greater number of ants have passed over Implementation of SI
the spot. Thus ants following pheromone trails
will tend to congregate simply from the fact Agent Behavior
that the pheromone density increases with each
additional ant that follows the trail. By exploita- It is very hard to predict the behaviour of agents
tion of the positive feedback effect, that is, the in an open multiagent systems that is character-
strengthening of the trail with every additional ant, ized by high degree of uncertainty. In our system
this algorithm is able to solve quite complicated we have limited our model to the successful or
combinatorial problems where the goal is to find unsuccessful interaction among the agents. Swarm
a way to accomplish a task in the fewest number based model considers behaviour of an agent as
of operations. Research on live ants has shown its willingness to carry out a particular interaction.
that when food is placed at some distance from This behaviour is then propagated throughout
the nest, with two paths of unequal length leading the system. If the agent is unable to fulfil certain
to it, they will end up with the swarm following interaction, it automatically gets isolated and is
the shorter path. no longer called for the interaction in future.
If a shorter path is introduced, though, for in-
stance, if an obstacle is removed they are unable Basic Concept
to switch to it. If both paths are of equal length,
the ants will choose one or the other. If two food Our model is based upon the swarm intelligence
sources are offered, with one being a richer source paradigm. Ant colony algorithm that is the subset
than the other, a swarm of ants will choose the of Swarm Intelligence is utilized in our model
richer source if a richer source is offered after the (Figure 1). Ants and insects communicate with
choice has been made, most species are unable to one another indirectly by making changes in the
switch but some species are able to change their environment a process called as stigmergy. Ants
pattern to the better source. If two equal sources while following a particular path towards the
are offered, an ant will choose one or the other food source lay down a special chemical called as
arbitrarily. Particle Swarm Optimization is the pheromone. If it’s the valid path, more ants follow
technique developed by Dr. Eberhart and Dr. the same path thereby increasing the pheromone
Kennedy (Eberhart & Kennedy 2001) inspired by concentration. So the newer generation of ants
social behavior of bird flocking or fish schooling. automatically given an option automatically fol-
In PSO, the potential solutions called particles fly low the path with higher pheromone concentration.
through the problem space by following the cur- In our model there are basically two types of
rent optimum particles. Each particle keeps track ants: unicast and broadcast ants. If no pheromone
of its coordinates in the problem space, which are concentration is available then the broadcast ant
252
Figure 1. Swarm intelligence reputation model
is sent to all the agents in the MAS. The ant that Modeling Peer Experience
is able to find the valid path towards the food This component basically involves the reputation
source then returns back to the source to complete information from the peers. The peers maintain
the process unicast ants are sent to the path in special tables that contain pheromone information
order to accumulate the pheromone concentration. of its neighbors. If the agent does not contain any
Each agent in our model is equipped with cer- direct experience information. It has to utilize the
tain modules. Reputation calculation requires basi- information from the peers or neighbors. For do-
cally two sources of information that are agent’s ing so, it broadcasts the request to the neighbors
direct experience and agent’s peers or neighbours and records a value against pheromone variable
experience. Our model consists of three modules for the replying agent.
that serve to maintain these sources of information:
namely, experience manager, recommendation Updating
manager and the reputation manager. Each one The changing levels of pheromone concentration
of these is discussed in detail below. capture the dynamic behaviour of agents in Swarm
Intelligence based system. If a path is no longer
Modeling Direct Experience followed by the agents the pheromone level starts
weakening with the passage of time.
This component basically involves the personal Now the pheromone concentration might be
experiences of the agent with the intended agent. high at some other path. If an agent leaves the
The experience is quantified in terms of the phero- current best path, then the agent may opt for
mone value. If the agent interacted with certain some other path based upon available reputation
agent X and that interaction was successful, the information and comes up with another path to-
experiencing agent will automatically record a wards agents that requires evaluation. Based on
value against pheromone variable. This value is this phenomenon the updating components of the
used as agents own personal experience with the model carry out the process. The final updating
agent under consideration. value is calculated at reputation update component.
253
Reputation Distribution recent changes in the system. The updating process

The proposed mechanism involves basically three follows the pheromone concentration philosophy
modules: i.e., collector, processor and transmitter. of the ant colony algorithm(Ramchurn, Huynh and
Each agent in our model is equipped with these Jennings 2004). If an agent is no longer used for
three modules. The collector module is responsible particular service etc, means that there is some
to collect reputation information from different other better service provider. Thus the value of
sources, the processor basically computes the pheromone variable decreases with the passage
reputation value and the transmitter provides the of time according to some factor. For example
reputation requestor with the required informa- consider a scenario. Let’s say an agent has to
tion (Figure 2). reach to agent six according to Figure 3. There
The collector module receives reputation in- are two paths to reach the destination but one of
formation of a target agent from the neighbors, them provides an optimum solution to reach the
as well as contains some personal reputation intended destination. The agent would use path
value computed by the containing agent based on from node two or node three depending upon the
some past experience with the agent under con- reputation of each node. According to philosophy
sideration. The reputation value from the neighbors of Swarm if the pheromone concentration on the
is placed in a special table. path two is found or is greater then the other op-
The processor module carries out some com- tion; agents would automatically follow that path.
putations on the reputation values gathered by Now the question arises about different levels
the collector module according to the algorithm of pheromone concentration on the two paths.
discussed in the following section and thus finds The answer is very simple in a way that from the
the final reputation value of the agent under biological behavior of ants. If ants reach to the
consideration. Basically the experience manager food source, on its way back it lays down the
and recommendation manager and their updat- chemical on the path so that other ants could sense
ing counter parts together make up the collector it. If for example the ant is unable to reach to the
module in any agent. destination, its choice of the path has failed and
The transmitter module gets invoked whenever it goes for some other path and carries out the
it receives reputation request from other request- same process. An important point to be noted over
ing agents about a particular agent. here is that with the passage of time if the current
The collector module basically consists of two optimum path is not being followed the concentra-
functions, i.e., to get information from different tion of pheromone weakens thereby letting ants
sources plus to update them as well so as to reflect to find some other path.
Figure 2. Reputation distribution modules

Figure 3. A swarm example
254
The Reputation Distribution Phenomenon From the above discussion we can further
We describe the reputation distribution phenom- elaborate the functionality of the Experience
enon with the help of connected nodes scenario, manager and the Recommendation manager.
where each node is connected with other, and is Experience Manager contains personal infor-
also subject to dynamism. In this condition, if the mation of each node with its immediate connected
node or agent is directly connected with the target agent or node.
agent, then the peer experience is equal to the Recommendation manager basically captures
personal experience. But if it is not directly con- the learning behavior of the ACO; where by reputa-
nected with the target node or agent the information tion of the target agent or node is evaluated on the
of the neighbours or peers is utilized in order to basis of personal experience of all the intermediate
compute the reputation value of the agent. Each nodes with one another, thus making the target
agent maintains a table containing the pheromone agent as the best possible source of the service.
information of the immediate connected nodes to If for example the target under observation
it. The node learns through the personal experience moves away as it’s the characteristic of the highly
of the neighbouring nodes. Thus the two important dynamic environment, the backward message
sources of information in building of the reputa- won’t be generated. Instead, the pheromone in-
tion value of an agent are done in a very novel formation contained at the last connected node or
manner. This phenomenon in fact also addresses agent would go on decreasing with the passage of
issue of distribution of reputation information. time, thus this information would be propagated
The two types of messages are generated, called to the source agent or node that the current service
as forward message and the backward message. provider is no longer available as its reputation
The forward message invokes the transmitter has decreased. So the source agent or node has
module of the nodes which in turn calls the collec- to now consult any other group of surrounding
tor module to find if the table contains information neighbours or group of nodes in order to evaluate
for the target agent. If yes, the transmitter module and propagate the reputation information, starting
immediately generates the backward message. with the broadcast message.
This backward message basically retraces the Through literature review we found that latest
same path updating the pheromone information model FIRE incorporated personal, certificate
of all the intermediate nodes maintained at the and the witness reputation information. About
collector module. The processor module carries certificate information it is the node or neighbor
out the calculations based upon hop count and that the service providing agent delegates its self
thus assigns the pheromone value before and also to the evaluating agent. It is just like the reference
after the updating processes (Figure 4). we mention in our CV. The biggest dilemma of
this approach is that certificate reputation can
Figure 4. Swarm Algorithm example
255
sometimes propagates wrong full information. maintained at each node. Once a backward mes-
If we look at the ant colony algorithm we come sage is accepted by the agent, it deletes its id form
across the conclusion that personal experience that the table to avoid duplication. Also, if it is unable
is the most highly rated reputation information is to receive the backward message within a speci-
used to build the witness or peer information. In fied period of time, its id is treated as stale.
addition, the certificate reputation comes into play
in a different fashion with the generation of the
backward message. If the target agent replies back ThE ALGORIThM
and generates the backward message, it is in fact
delegating a certificate to the last connected agent The algorithm captures the phenomena of repu-
or node to be used by other neighboring nodes. tation distribution using ant colony algorithm.
An important concern in trust and reputation The collector module basically searches for the
models is how to capture the wrongful or deceitful pheromone information of the requested agent.
behaviour of the agents. In our case the deceitful If found, it generates the backward message;
action can be in the form of generation of illegal while in other case, it repeats the same process
backward message, thereby attracting the source and generates the broadcast message for all the
agent to the wrong path towards the destination agents in search of the pheromone information.
agent. Proposed by (Lam and Leung 2004) where Processing module receives the reputation
by a unique id is assigned to each forward ant, values from the collector and carries out the re-
backward ant with the same id is generated that quired calculations as mentioned in the algorithm.
retraces the given path and only the recorded id at Transmitter module responses back to any request
each node is entertained as the backward message generated for the agent to acquire the reputation
so the problem of lie in reputation mechanism can value.
be easily dealt. Lie detection has been one of the
most important aspects of any reputation model.
However, our algorithm can solve this problem Collector Module
and certain other issues related to false reputation {
propagation (Figure 5). Check if the requested
For example the legitimate way to reach to node’s pheromone info available
agent 4 is through 1. But somehow agent 2 starts If available then
generating wrong backward messages making Call Backward()
agent 0 to realize agent 4 through it. The problem Else
can be solved if nodes only entertain the backward Call Broadcast
messages with the id’s recorded in special tables ()
}
Transmitter Module
Figure 5. Lie detection example
{
Call Collector Module
}
Broadcast()
{
Send hello message in
search of pheromone information
to all the agents in the domain
256
} percentage of the requests of which the requestor

forward() successfully obtains the evidence, or the reputation
{ value. The FIRE (Patel 2007) model has considered
Assign id to the for- a scenario of producers and consumers with an
ward message assumption of only one type of service availabil-
Call transmitter ity. The producers (Providers) are categorized as
} good, ordinary, intermittent and bad, depending
Backward() upon the quality of service they are rendering.
{ The Intermittent and Bad providers are the most
Retrace nodes in back- random ones. If consumer agent needs to use the
ward direction from the calling service, it can contact the environment to locate
agent node nearby provider agents. The consumer agent will
Call Processor Module then select one provider from the list to use its
} services. The selection process depends upon
Processor Module the reputation model of the agent. Consumer’s
{ satisfaction is calculated in terms of Utility Gain
Check the id of back- (UG). We apply our mechanism to the simulation
ward message environment used by the FIRE model. Consist-
If legal then do the ing of consumers and producers, FIRE model
following steps incorporates various sources of information in
Calculate hop- order to compute the final value. But the issue of
count this reputation information distribution is still a
Divide hopcount big question mark. Also there is no mechanism
by 100 -a- defined in the model that can make it adaptable to
Add 100 to the changes taking place in the open environment.
proabaility value of the node
Add that pro-
abability value of the tar- RESULTS
get node with the probabil-
ity value of other connected Experimental Methodology
nodes -b-
Calculate the We have utilized an ant net simulator to compute
ratio a/b results for the reputation model. The ant simula-
Set the prob- tor is written in c# and utilizes dot net charting
ability of the node to its cur- capabilities to graphically represent the results.
rent value multiplied by the ANT Net uses virtual pheromone tables much like
ratio when an ant follows a path dropping pheromones
} to re-enforce it. The quicker the ants move down
a path the more throughput of ants thus a greater
We can evaluate our proposed scheme through concentration of pheromones. In the same way
two parameters: i.e. hop counts and success rate. pheromone tables in ANT Net allow fast routes
Hop count depicts the number of nodes or agents to score a higher chance of being selected whilst
needed to be followed in order to reach to the the less optimal route scores a low chance of be-
required destination, while success rate is the ing selected.
257
Every node holds a pheromone table for all This inherent uncertainty in Manets makes it a
other nodes of the network. Each pheromone very suitable candidate for the evaluation of the
table holds a list of table entries containing all proposed reputation model. The reputation model
the connected nodes of the current node. for such environment needs to posses following
properties.
Simulation Results
• It should take variety of sources of infor-
The simulation results are computed in terms of mation to have a more robust and reliable
the average hop counts in various scenarios. The reputation value.
Reputation Model how ever shows its efficiency • Since there is no central authority in the
in different scenarios in terms of the Utility Gain system each agent should be equipped with
(UG) value. Utility Gain can be defined as the level the reputation model in order to interact
of satisfaction an agent gains after interacting with with one another.
any other agent to consume certain services. UG
is inversely proportional to the average number The swarm intelligence based model not only
of hop counts. provides a mechanism to calculate the reputation
value of an agent but in fact also provides the
Experimental Test Bed mechanism that is utilized to disseminate that
reputation information to all the agents in the
To evaluate the proposed reputation model for system, under high dynamism.
multiagent systems, we required an open system We have evaluated our system on the basis of
whose dynamic nature is captured by the model. hop counts only. The basic phenomenon behind
Therefore, we chose mobile adhoc network as a our algorithm is the concentration level of phero-
case study to find the effectiveness of our model. mone. As the level of trust upon agents increases
An adhoc network involves collection of mobile so as the concentration of the pheromone variable
nodes that are dynamically and arbitrary located. contained by every agent in the system. The repu-
The interconnections between the nodes change tation value degrades by the weakening level of
on continual basis. pheromone variable. The number of hop counts
Each node in Mobile adhoc networks(Manets) efficiently captures this thing. As the reputation
decides about its route. In fact, there are no des- gets built around certain agents, the probability
ignated routers and nodes forward packets from of selecting them for any future service becomes
node to node in multi hop fashion. Nodes move- higher. Therefore, the same destination that would
ment implies that current routes become invalid have been achieved by trial and error can now be
in future instances (Huynh, Jennings, Shadbolt consulted for any service efficiently and effec-
2006, Patel 2007). In such an environment quality tively by the inherent capability of the proposed
of Service (QoS) is a big issue in terms of secure algorithm. Previous trust and reputation models
paths, bandwidth consumption, delays and etc. have incorporated various methods to measure
Our reputation model can be applied to Manets the UG attained by agents in the system that de-
in order to show its applicability and robustness. picts the level of satisfaction that an agent gains
The nodes in Manets could be thought as agents in from another after the interaction. We have made
mutliagent systems. Let’s take a simple problem of number of hop counts the very basis of the UG
route discovery in Manets with reputation mecha- in our proposed scheme. As the number of hops
nism. Mobile adhoc networks are open system decreases in different scenarios UG gets increased
and constantly undergo change in agent topology. thus they are inversely proportional to each other.
258
Figure 6. Ant algorithm set to off
Figure 7. Ant algorithm set to on
Since we are utilizing special case of Manets, When no ant algorithm is utilized, meaning
for evaluation we take average hop counts as the that there is no learning capability in the process,
measure of effectiveness of our model. The lesser thus in order to utilize the same services, agents
the number of hops required in any particular have to seek more agents for further information,
scenario, the more adaptable our system is under thereby increasing the average hop counts.
certain conditions. The adaptability of the system to the dynamic
We tested the scenario with different param- environment is captured by making certain nodes
eters. For example, first we evaluated the system unavailable. Let’s say we have removed node 5,
with ant net algorithm set to off (Figure 6). The 9, and 11 from the system which no longer can
average hop counts found are 5.08. be utilized to propagate the reputation informa-
As compared to when the ant algorithm is set tion. Under such circumstances, the average hop
to on in the same scenario, the average hop counts counts comes up to be 4.89. As compared to the
becomes 4.72 (Figure 7). The reduction in the same situation when the ant algorithm is switched
number of hop counts is the result of the phero- off the average hop count comes up to be 5.43
mone variable value. The agents in the system Capturing dynamism in the open environment
learn about each other’s reputation based on the is one of the important targets of the interactions
previously discussed phenomena. among agents in the open multi agent systems
(Figures 8 and 9). However, even the previous
259
Figure 8. Capturing dynamism with ant algorithm set to on
Figure 9. Capturing dynamism with ant algorithm set to off
Figure 10. Graphical comparisons between ant algorithm on and off
recent model FIRE was unable to accurately when swarm algorithm is utilized. The average
capture the highly dynamic environment. When hop counts is less showing the system is able to
certain nodes (agents) are removed from the system adapt itself to the changing conditions (Figure 10).
the average number of hop counts increases if no Looping is an important issue in case of Manets
swarm algorithm utilized as compared to the fact in order to detect and correct the duplicate infor-
260
Figure 11. Looping allowed
Figure 12. Graphical representation of allowed Looping
Figure 13. Looping removed
mation travelling in the network. If looping is The above results show that the swarm intel-
allowed the average hop counts results to be 6.69; ligence based reputation mechanism is able to
while if it is set to off the average hop counts learn with the changes taking place in the system.
computes to be 5.43 (Figures 11 through 14). And can quickly adapt itself to the changing
261
Figure 14. Graphical comparison between allowed and is allowed looping
Figure 15. Utility gain with no ant algorithm
conditions as opposed to the system when no ant the effectiveness of the proposed model in terms
net algorithm is utilized. of utility gain (Figures 15 and 16).
Compared to previous recent work in the do- From the charts it is visible that while ant al-
main of reputation models, the FIRE model states gorithm is utilized the utility gain increases with
that in order to have a robust reputation the model the same number of interactions as opposed to
should be able to gather information from diverse the fact when no ant algorithm is utilized. Simi-
sources. FIRE model has incorporated personal larly UG in case of changing conditions also shows
experience, witness reputation, rule based and a marked increase showing that the system is able
the certificate reputation. The swarm intelligence to learn and adapt itself. Comparing our results
reputation model inherently incorporates different with the research aims set, we find that the pro-
types of information in order to come up with the posed reputation model truly captures the dyna-
final reputation value of the agent. On top of that, mism in the agent environment. The reputation
these sources of information have learning capa- information is not held centrally and exists in
bility and can adapt themselves to the changing distributed form and finally captures information
nature of open MAS. This fact is apparent from from various sources in order to compute the final
the simulation results. Thus we can further depict reputation value. In previous work, weights were
262
Figure 16. Utility gain with ant algorithm
Table 1. Percent change in UG of Swarm Model
Number of interactions 21 41 61 81
UG 4 4.2 4.5 4.4
%Gain 5% 7.1% -2.2%
Table 2. Percent change in UG of FIRE Model
Number of interactions 21 41 61 81
UG 6.3 6.8 6.5 6.4
%Gain 7% -4.4% -1.5%
assigned to the different sources of information, the system finding the average hop counts; at 21,
particularly information gained from the neighbors 41, 61 and 81 instances of interactions. The data
is weighed lower due to risk associated with it. is given in Table 1.
In Swarm based system, the weight assigned to As compared to the percentage change in the
the peer information is very logical in a way if Utility Gain of the interactions among agents in
the agent doesn’t reply back in particular period the system using FIRE model to compute the
of time, the agent will loose its reputation value reputation value of the agents given in Table 2.
in the form of evaporating pheromone concentra- By analyzing the percentage change in the
tion, thereby making it less attractive for the agents utility gains of the two systems given same num-
to consult for any information. ber of interactions, we find that Swarm based
system is more adaptable to the dynamism in the
Over All Performance environment as compared to the previous FIRE
model. Thus from above results we can deduce
We evaluated the over all performance of the that utility gain based on hope counts yields more
proposed model by computing the percentage gain value as compared to the Utility Gained by agents
in special case of dynamism. And compared the utilizing FIRE model to compute the reputation
result with that of the percentage gain in the FIRE values.
model. We computed our data from the simulation
results obtained after removing certain agents from
263
CONCLUSION AND FUTURE the agents to change value of certain parameters

RESEARCh DIRECTIONS that are used to compute the reputation values.
In this work we have evaluated our proposed
Open multiagent systems continuously undergo algorithm on the basis of average number of hop
changes. In such environment there is need for the counts, and making certain changes in the envi-
participating agents to interact with one another ronment and then comparing the results with the
in a trustworthy manner regarded as the trust. previous recent reputation model called that FIRE
Trust requires creation of the reputation of the judges the learning capability of the algorithm.
participating agents and the distribution of this The output of reputation models are judged on
reputation information that can adapt itself to the basis of Utility Gain that shows the level of
the dynamism in the environment. We have pro- satisfaction an agent gets while interacting with
posed a swarm intelligence based mechanism for one another. It is found that in Swarm based al-
reputation creation and dissemination, specifically gorithms the percentage utility gain is higher as
utilizing the ant colony algorithm. Our model is compared to previous works.
novel in a sense that no existing model of trust has Future work can focus on additional parameters
as yet addressed the issue of reputation creation for performance evaluation such as lie detection,
an information distribution with minimum over- which will determine the authenticity of reputation
heads. Our model has the capability of utilizing information of the agents.
environment itself as the carrier of information.
The inherent optimizing capability of the Swarm
Algorithm makes process of reputation creation REFERENCES
and distribution highly adaptable to the changing
environment. Beni, G., & Wang, J. (1989). Swarm Intelligence
Research in the field of Trust Management is in Cellular Robotic Systems. In Proceed. NATO
still at its infancy stage, newer models are coming Advanced Workshop on Robots and Biological
up however no standard has been widely accepted Systems, Tuscany, Italy, June 26–30
as such. Certain models try to compute their ef- Covaci, S. (1999). Autonomous Agent Technol-
ficiency in terms of the empirical evaluation while ogy. In Proceedings of the 4th international sym-
others restrict their use to certain domains. So posium on Autonomous Decentralized Systems.
there is a requirement of one such model that can Washington, DC: IEEE Computer Science Society.
capture the basic characteristic of the dynamism
of the open Multiagent Systems and also fulfils Huynh, T. D., Jennings, N. R., & Shadbolt, N. R.
the requirements of a reputation model. For the (2006). An integrated trust and reputation model
robust performance of the reputation models, they for open multi-agent systems. Journal of Autono-
should incorporate information about an agent mous agents and multi agent systems.
from various sources so as to achieve a compre-
Jurca, R., & Faltings, B. (2003). Towards Incen-
hensive value that does not depends only upon
tive-Compatible Reputation Management. Trust,
one factor but takes into account various dimen-
Reputation and Security: Theories and Practice
sions. Swarm based model, however successfully
(LNAI 2631, pp. 138-147).
achieves this by incorporating information from
different sources with minimum overheads be- Kennedy, J., & Eberhert, R. C. (2001). Swarm
cause the environment itself is used as the carrier Intelligence. Morgan Kaufmann.
of information. Environment acts as the stimuli to
264
Lam, K., & Leung, H. (2004). An Adaptive Boukerche, A., & Li, X. (2005). An Agent-based
Strategy for Trust/ Honesty Model in Multi-agent Trust and Reputation Management Scheme for
Semi- competitive Environments. In Proceedings Wireless Sensor Networks. In Global Telecom-
of the 16th IEEE International Conference on Tools munications Conference, 2005. GLOBECOM
with Artificial Intelligence(ICTAI 2004) ‘05. IEEE.
Patel, J. (2007). A Trust and Reputation Model For Fullam, K., Klos, T., Muller, G., Sabater, J.,
Agent-Based Virtual Organizations. Phd thesis in Schlosser, A., Topol, Z., et al. (2005). A Specifi-
the faculty of Engineering and Applied Science cation of the Agent Reputation and Trust (ART)
School of Electronics and Computer Sciences Testbed. In Proc. AAMAS.
University of South Hampton January 2007.
Gunes, M., Sorges, U., & Bouazizi, I. (2002).
Ramchurn, S. D., Huynh, D., & Jennings, N. ARA- the ant Based Routing Algorithm for MA-
R. (2004). Trust in multiagent Systems. The NETs. International workshop on Adhoc network-
Knowledge Engineering Review, 19(1), 1–25. ing (IWAAIN 2002) Vancouver British Columbia
doi:10.1017/S0269888904000116 Canada, August 18-21 2002.
Schlosser, A., Voss, M., & Bruckner, L. (2004). Hughes, T., Denny, J., & Muckelbauer, P. A.
Comparing and evaluating metrics for reputation (2003). Dynamic Trust Applied to Ad Hoc Net-
systems by simulation. Paper presented at RAS- work Resources. Paper presented at 6th Work shop
2004, A Workshop on Reputation in Agent Societ- on Trust, Privacy Deception and Fraud In Agent
ies as part of 2004 IEEE/WIC/ACM International Societies Melbourne 2003
Joint Conference on Intelligent Agent Technology
Kagal, L., Cost, S., Finin, T., & Peng, Y. (2001).
(IAT’04) and Web Intelligence (WI’04), Beijing
A Framework for Distributed Trust Management.
China, September 2004.
Paper presented at the Second Workshop on Norms
Zacharia, G., & Maes, P. (2000). Trust manage- and Institutions in MAS, Autonomous Agents.
ment through reputation mechanisms. Applied
Kennedy, J., & Eberhart, R. C. (2002). Book
Artificial Intelligence Journal, 14(9), 881–908.
Review Swarm Intelligence. Journal of Genetic
doi:10.1080/08839510050144868
Programming and Evolvable Machines, 3(1).
Li, X., Hess, T.J., & Valacich, J.S. (2006). Us-
ing Attitude and Social Influence to Develop an
ADDITIONAL READING
Extended Trust Model for Information Systems.
Abdui-Rahman, A., & Hailes, S. (1997). A Dis- Database for advances in Information Systems.
tributed Trust Model. Paper presented at the 1997 Liu, J., & Issarny, V. (2004). Enhanced Reputa-
New Security Paradigms Workshop. Langdale, tion Mechanism for Mobile Ad Hoc Networks. In
Cumbria UK. ACM. Proceedings of iTrust 2004, Oxford UK.
Amir Pirzada, A., Datta, A., & McDonald, C. Mars, S. P. (1994). Formalising Trust as a Com-
(2004). Trusted Routing in Ad-hoc Networks using putational Concept. Department of Computing
Pheromone Trails. In Congress of Evolutionary Science and Mathematics University of Stirling
Computation, CEC2004. IEEE. April 1994 PhD thesis
Botely, L. (n.d.). Ant Net Simulator. University
of Sussex, UK.
265
Marsh, S., & Meech, J. (2006). Trust in Design. Song, W. (2004). Neural Network-Based Reputa-
National Research Council Canada Institute of tion Model in a Distributed System. In Proceed-
Information Technology. ings of the IEEE International Conference on
E-Commerce Technology. IEEE.
Nurmi, P. (2005). Bayesian game theory in prac-
tice: A framework for online reputation systems. Teacy, W.T.L., Huynh, T.D., Dash, R.K., Jennings,
University of Helsinki Technical report, Series of N.K., & Patel, J. (2006). The ART of IAM: The
Publications C, Report C-2005-10. Winning Strategy for the 2006 Competition.
Pujol, J. M., Sangüesa, R., & Delgado, J. (2002). Teacy, W.T.L., Patel, J., Jennings, N.R., & Luck,
Extracting Reputation in Multi Agent Systems by M. (2005). Coping with Inaccurate Reputation
Means of Social Network Topology. In Proceed- Sources: Experimental Analysis of a Probabilistic
ings of the First International Joint Conference Trust Model. AAMAS’05.
on Autonomous Agents and Multiagent Systems:
Theodorakopoulos, G., & Baras, J. S. (2006).
Part 1 (pp. 467-474). ACM.
On Trust Models and Trust Evaluation Metrics
Sabater, J. (2003). Trust and Reputation for Agent for Ad Hoc Networks. IEEE Journal on Selected
Societies. PhD thesis, University Autonoma de Areas in Communications, 24(2). doi:10.1109/
Barcelona. JSAC.2005.861390
Sabater, J. (2004). Toward a Test-Bed for Trust Xiong, L., & Liu, L. (2003). A Reputation-Based
and Reputation Models. In R. Falcone, K. Bar- Trust Model for Peer-to-Peer eCommerce Com-
ber, J. Sabater, & M. Singh (Eds.), Proc. of the munities. In Proceedings of the IEEE International
AAMAS-2004 Workshop on Trust in Agent Societ- Conference on E-Commerce (CEC’03).
ies (pp. 101-105).
Yamamoto, A., Asahara, D., Itao, T., Tanaka,
Schlosser, A., & Voss, M. (2005). Simulating Data S., & Suda, T. (2004). Distributed Pagerank: A
Dissemination Techniques for Local Reputation Distributed Reputation Model for Open Peer-
Systems. In Proceedings of the Fourth Interna- to-Peer Networks. In Proceedings of the 2004
tional Joint Conference on Autonomous Agents International Symposium on Applications and
and Multiagent Systems (pp. 1173-1174). ACM. the Internet Workshops (SAINTW’04).
Sierra, C., & Debenham, J. (2005). An Informa- Zheng, X., Wu, Z., Chen, H., & Mao, Y. (2006).
tion-based model for trust. AAMAS 05. Utrecht Developing a Composite Trust Model for Multi
Netherlands. agent Systems. In Proceedings of the Fifth Inter-
national Joint Conference on Autonomous Agents
Smith, M.J., & desJardins, M. (2005).A Frame-
and Multiagent Systems (pp. 1257-1259). ACM.
work for Decomposing Reputation in MAS in to
Competence and Integrity. AAMAS’05.
266
267
Chapter 15
Exploitation-Oriented
Learning XoL:
A New Approach to Machine Learning
Based on Trial-and-Error Searches
Kazuteru Miyazaki
National Institution for Academic Degrees and University Evaluation, Japan
ABSTRACT
Exploitation-oriented Learning XoL is a new framework of reinforcement learning. XoL aims to learn
a rational policy whose expected reward per an action is larger than zero, and does not require a so-
phisticated design of the value of a reward signal. In this chapter, as examples of learning systems that
belongs in XoL, we introduce the rationality theorem of profit Sharing (PS), the rationality theorem of
reward sharing in multi-agent PS, and PS-r*. XoL has several features. (1) Though traditional RL sys-
tems require appropriate reward and penalty values, XoL only requires an order of importance among
them. (2) XoL can learn more quickly since it traces successful experiences very strongly. (3) XoL may be
unsuitable for pursuing an optimal policy. The optimal policy can be acquired by the multi-start method
that needs to reset all memories to get a better policy. (4) XoL is effective on the classes beyond MDPs,
since it is a Bellman-free method that does not depend on DP. We show several numerical examples to
confirm these features.
INTRODUCTION sion Processes (MDPs) (Sutton, 1988; Watkins

and Dayan, 1992; Ng et al., 1999; Gosavi, 2004;
The approach, called reinforcement learning (RL), Abbeel and Ng, 2005). We call these methods
is much more focused on goal-directed learning that are based on DP DP-based RL methods. In
from interaction than are other approaches to ma- general, RL uses a reward as a teacher signal for
chine learning (Sutton and Barto, 1998). It is very its learning. The DP-based RL method aims to
attractive since it can use Dynamic Programming optimize its behavior under the values of reward
(DP) to analyze its behavior in the Markov Deci- signals that are designed in advance.
We want to apply RL to many real world
DOI: 10.4018/978-1-60566-898-7.ch015 problems more easily. Though we know some
Exploitation-oriented Learning XoL
important applications (Merrick et al., 2007), gen- ences very strongly. (3) XoL may be unsuitable
erally speaking, it is difficult to design RL systems for pursuing the optimality. It can be guaranteed
to fit on a real world problem. We think that the by the multi-start method (Miyazaki et al., 1998)
following two reasons concern with it. In the first, that resets all memories to get a better policy. (4)
the interaction will require many trial-and-error XoL is effective on the classes beyond MDPs
searches. In the second, there is no guideline how such that the Partially Observed Markov Deci-
to design the values of reward signals. Though they sion Processes (POMDPs), since it is a method
are not treated as important issues on theoretical that does not depend on DP called a Bellman-free
researches, they are able to be a serious issue in method (Sutton and Barto, 1998).
a real world application. Especially, if we have In this chapter, we focus on the POMDPs
assigned inappropriate values to reward signals, environments where the number of types of a
we will receive an unexpected result (Miyazaki reward is one. As examples of learning systems
and Kobayashi, 2000). We know the Inverse Rein- that belong in XoL at the environments, we intro-
forcement Learning (IRL) (Ng and Russell, 2000; duce the rationality theorem of PS, the rationality
Abbeel and Ng, 2005) as a method related to the theorem of PS in multi-agent environments, and
design problem of the values of reward signals. If PS-r*. We show several numerical examples to
we input our expected policy to the IRL system, support how to use these methods.
it can output a reward function that can realize
the policy. IRL has several theoretical results, i.e.
apprenticeship learning (Abbeel and Ng, 2005) PROBLEM FORMULATIONS
and policy invariance (Ng et at., 1999).
On the other hand, we are interested in the Notations
approach where reward signals are treated in-
dependently and do not require a sophisticated Consider an agent in some unknown environment.
design of the values of them. Furthermore, we aim The agent senses a set of discrete attribute-value
to reduce the number of trial-and-error searches pairs and performs an action in some discrete
through strongly enhancing successful experi- varieties. The environment provides a reward
ences. We call it Exploitation-oriented Learning signal to the agent as a result of some sequence
(XoL). As examples of learning systems that can of an action. We denote the sensory inputs as x,
belong in XoL, we know the rationality theorem y,… and actions as a, b,… . A sensory input and
of Profit Sharing (PS) (Miyazaki et al., 1994), the an action constitute a pair that is termed as a rule.
Rational Policy Making algorithm (Miyazaki et al., We denote the rule “if x then a” as xa. In Profit
1998), the rationality theorem of PS in multi-agent Sharing (PS), a scalar weight, that indicates the
environments (Miyazaki and Kobayashi, 2001), importance of the rule, is assigned to each rule.
the Penalty Avoiding Rational Policy Making The weight of the rule xa is denoted as wxa . The
algorithm (Miyazaki and Kobayashi, 2000) and function that maps sensory inputs to actions is
PS-r* (Miyazaki and Kobayashi, 2003). termed a policy. We call a policy rational if and
XoL has several features. (1) Though tradi- only if the expected reward per an action is
tional RL systems require appropriate values of larger than zero. Furthermore, a useful rational
reward signals, XoL only requires an order of policy is a rational policy that is not inferior to
importance among them. In general, it is easier the random walk (RA) where the agent selects an
than designing their values. (2) XoL can learn action based on the same probability to every
more quickly since it traces successful experi- action in every sensory input. The policy that
268
Figure 1. A conflict structure

tours (xb) and (ya·za), as shown in Figure 2b).
The rules on a detour may not contribute to obtain
a reward. We term a rule as irrational if and only
if it always exists on a detour. Otherwise, a rule
is termed as rational. An irrational rule should
not be selected when they conflict with a rational
rule.
PS reinforces all the weights of rules on an
episode at the same time when the agent obtains
a reward. We term the number of rules on an
episode as a reinforcement interval. Also, we term
the function that shares a reward among the weights
maximizes the expected reward per action is of rules on an episode as a reinforcement function.
termed as the optimal policy. fi denotes the reinforcement value for the weight
The environment is treated as consisting of of the rule selected at step i before a reward is
stochastic processes, where a sensory input cor- obtained. We assume that the weight of each rule
responds to some state and an action to some state is reinforced by wr = wr + fi for the episode (
transition operator. We show an environment rep- i i
resented by a state transition diagram in Figure 1. rW ··· ri ··· r2 · r1 ), where W denotes the reinforce-
The node with a token denotes a sensory input at ment interval of the episode.
time t. Three rules match the sensory input. Since
the state transition is not deterministic, selection Properties of the Target
of the same rules does not always lead to the same Environments
state. The branching arcs indicate such cases. We
term a part of the state transition diagram around We focus on the POMDPs environments where
one sensory input as a conflict structure. Figure the number of types of a reward is one. In POM-
1 is an example of it. DPs, the agent may sense different states on an
Figure 2a) is an environment consisting of environment as the same sensory input. We call
three sensory inputs, x, y and z, denoted by circles. the sensory input a Partially Observable (PO)
Two actions, a and b, can be selected in each sensory input.
sensory input. An arrow means a state transition We recognize that the learning in POMDPs
by execution of an action described on the arrow. must overcome two deceptive problems (Mi-
We term the rule sequence selected between yazaki et al., 1998). We term the indistinguishable
two rewards, or an initial sensory input and a of state values, that are assigned for each state on
reward, as an episode. For example, when the the environment in POMDPs, as a type 1 confu-
agent selects xb, xa, ya, za, yb, xa, za and yb in sion. Figure 3a) is an example of the type 1 con-
Figure 2a), there exist two episodes (xb·xa·ya·za·yb) fusion. In this example, the state value (v) is es-
and (xa·za·yb), as shown in Figure 2b). We term timated by the minimum number of steps required
the subsequent episode as a detour when the to obtain a reward1. The values for the states 1a
sensory input of the first selection rule and the and 1b are 2 and 8, respectively. Although the
sensory output of the last selection rule are the state 1a and 1b are different states, the agent
same although both rules are different. For ex- senses them as the same sensory input 1 hatched
ample, the episode (xb·xa·ya·za·yb) has two de- in Figure 3. If the agent experiences the state 1a
269
Figure 2. a) An environment consisting of three sensory inputs and two actions. b) An example of an
episode and a detour
Figure 3. Examples of type 1(a) and type 2(b) Figure 4. Three classes of the target environments
confusions
it will fall into an irrational policy where the agent

and 1b equally likely, the value of the sensory only transits between the states 1a and 2.
2+8 In general, if there is a type 2 confusion in
input 1 becomes 5 (= ). Therefore, the some sensory input, there is a type 1 confusion in
2
value is higher than that of the state 4 (that is 6). it. On account of these confusions, we can clas-
If the agent uses the state values for its learning, sify the target environments into three classes,
it would prefer to move left in the sensory input as shown in Figure 4. MDPs belong to class 1.
. On the other hand, the agent has to move right Q-learning (QL) (Watkins and Dayan, 1992), that
3
in the sensory input 1. This implies that the agent is a representative RL system, is deceived by a
has learned the irrational policy, where it only type 1 confusion since it uses state values to for-
transits between the states 1b and 3. mulate a policy. As PS does not use state values,
We term the indistinguishable of rational and it is not deceived by the confusion. On the other
irrational rules as a type 2 confusion. Figure 3b) hand, RL systems that use the weight (including
is an example of the type 2 confusion. Although QL and PS) might be deceived by a type 2 confu-
the action of moving up in the state 1a is irratio- sion. In the next section, we show the rationality
nal, it is rational in the state 1b. Since the agent theorem of PS that guarantees the acquisition of
senses the states 1a and 1b as the same sensory a rational policy in the POMDPs where there is
input 1, the action of moving up in the sensory no type 2 confusion and the number of types of
input 1 is regarded as rational. If the agent learns a reward is one.
the action of moving right in the sensory input S,
270
Figure 5. Some reinforcement functions. c) and

RATIONALITY OF PROFIT ShARING
d) satisfy the rationality theorem of PS
Rationality Theorem of Profit Sharing
In this section, we consider the rationality of PS.

We will introduce the following theorem called
the Rationality Theorem of PS.
Theorem 1 (The Rationality

Theorem of PS)
PS can learn a rational policy in the POMDPs

where there is no type 2 confusion and the number
of types of a reward is one if and only if We show another function that satisfies the
theorem (Figure 5d), Func-d).
W
∀i = 1, 2,...W . L ∑ f j < fi −1, (1) Func-d: : g 0 = f0 , g1 = f1 + h, g 2 = f2 − h, g 3 = f3
j =i f2
g 4 +  + gW <
−h
2
1
and gn = gn −1, n = 5, ,W
where L is the upper bound of the number of con- 2
1
flicting rational rules, and W is the upper bound where fn = fn −1
2
of the reinforcement interval. ■ f
0<h < 2
We term Equation(1) as suppression condi- 2
tions. The proof is presented in Appendix A.
Theorem 1 guarantees the rationality that can
learn a rational policy. We cannot determine the
number L in general. However, in practice, we A Multi-Start Method
can set L=M-1, where M is the number of actions. for Profit Sharing
There are several functions that satisfy the
theorem. We present an example of those func- For the class where there is a type 2 confusion,
tions in Figure 5c) and d). It should be noted that we cannot always obtain a rational policy by the
conventional reinforcement functions, such as a rationality theorem of PS. In this case, we should
constant function (Grefenstette, 1988) (Figure experiment with a multi-start method, where each
5a), Func-a) or some decreasing one (Liepins memory is initialized to formulate a new policy.
et al., 1989) (Figure 5b), Func-b), do not satisfy For example, if no reward is obtained within
the theorem. 2k steps, where k is the number of steps from
The following geometrically decreasing func- initial step to the final reward obtaining step,
tion satisfies the theorem (Figure 5c), Func-c). we can establish that there is a type 2 confusion
in the environment. Subsequently, we use the
1 multi-start method to guarantee the rationality in
Func-c : fn = f the environment.
S n −1
where S ≥ L +1 In Expansion of PS to POMDPS, we show
another method to approach to a type 2 confusion.
n = 1, 2, ,W − 1
271
Figure 6. The problem used in the experiment Figure 7. The state transition diagram used in
the example
Figure 8. The rates of rational policies
Application to a Maze-
like Environment
Setting
We compare PS with QL in the environment that

consists of 10 x 10 honeycomb cells as shown
in Figure 6. There are always 10 fuel pots; the 2: {NF → TURN, SF → MOVE, ON → LEFT}.
capacity of each is 20 liters. The agent consumes The policy 1 is the optimal policy.
one liter of fuel per an action. The agent obtains a
reward when it takes a fuel pot. A fuel pot disap- Results and Discussion
pears when it is taken by the agent, and instantly
appears at a random position on the same vertical In this environment, we change the initial amount
axis. In this environment, there is no type 2 confu- of fuel that the agent has taken. We carried out
sion. The agent can sense three sensory inputs as 100 trials with different random seeds. Figure 8
follows: 1) SF: there is something in the forward shows the rates of rational policies acquired until
direction, 2) NF: there is nothing in the forward the agent has consumed the initial fuel.
direction, and 3) ON: there is something at the Three types of PS are tested in this environ-
position where there is the agent. ment. Two types of PS (“■” and “X” marks in
The agent can perform four actions as follows; Figure 8), that do not satisfy theorem 1, cannot
1) MOVE: move forward, 2) TURN: turn by 60 always learn a rational policy. On the other hand,
degrees clockwise, 3) RIGHT: take a fuel pot in the PS (“■” marks in Figure 8), that satisfies
right hand, and 4) LEFT: take a fuel pot in the theorem 1, guarantees the acquisition of a rational
left hand. LEFT fails with a probability of 50%. policy. In this environment, if the agent takes a
Figure 7 shows the state transition diagram fuel pot once, it can learn a rational policy. There-
of this example. There exist two rational poli- fore, the PS that satisfies theorem 1 can learn a
cies. One is the policy 1: {NF → TURN, SF → rational policy with a high ratio in the case of a
MOVE, ON → RIGHT}, the other is the policy small amount of the initial fuel.
272
Figure 9. A rule sequence when 3 agents select an action in some order
Figure 10. Three episodes and one detour in Figure 9
The learning rate of QL is 0.02 and its discount senses the environment and performs an action.
rate is 0.8. Though QL guarantees the acquisition The agent senses a set of discrete attribute-value
of the optimal policy, it requires numerous trials. pairs and performs an action in M discrete variet-
In this example, QL requires more initial fuel than ies. We denote agent i’s sensory inputs as xi, yi,⋯
the PS to acquire a rational policy. Furthermore, and its actions as ai, bi,⋯.
though it is not shown in Figure 8, QL requires When the n’th agent (0<n’≤n) has a special
5000 liters to guarantee the acquisition of the sensory input on condition that (n’-1) agents have
optimal policy. special sensory inputs at some time step, the n’th
agent obtains a direct reward R (R>0) and the
other (n-1) agents obtain an indirect reward μR
PROFIT ShARING IN MULTI- (μ≥0). We call the n’th agent the direct-reward
AGENT ENVIRONMENTS agent and the other (n−1) agents indirect-reward
agents. We do not have any information about
Approaches to Multi- the n’ and the special sensory input. Furthermore,
agent Environments nobody knows whether (n-1) agents except for the
n’th agent are important or not. A set of n’ agents
In this section, we consider n (n>1) agents. At that are necessary for obtaining a direct reward
each discrete time step, an agent i (i=1,2,…n) is is termed the goal-agent set. In order to preserve
selected from n agents based on the selection the rationality in the multi-agent environments
probabilities Pi (Pi > 0, ∑ i =1 Pi = 1 , and it
n
all agents in a goal-agent set must learn a rational
273
policy. When a reward is given to an agent, PS Theorem 2 (The Rationality Theorem

reinforces rules on the episode. In our multi-agent of PS in the Multi-agent Environments)
environments, the episode is interpreted by each
agent. For example, when 3 agents select the rule In the POMDPs where there is no type 2 confu-
sequence (x1a1, x2a2, y2a2, z3a3, x2a2, y1a1, y2b2, z2b2 sion and the number of types of a reward is one,
and x3b3) (Figure 9) and have special sensory inputs any irrational rule in some goal-agent set can be
(for obtaining a reward), it contains the episode suppressed if and only if
(x1a1 ∙ y1a1), (x2a2 ∙ y2a2 ∙ x2a2 ∙ y2b2 ∙ z2b2) and (z3a3
∙ x3b3) for agent 1, 2 and 3, respectively (Figure M −1
m< , (3)
10). In this case, agent 3 obtains a direct reward W 1 W0
and the other agent obtain an indirect reward. We M (1 − ( ) )(n − 1)L,
M
assume that the initial sensory input on the episode
for a direct-reward agent is the same.
where M is the upper bound of conflicting rules
When the agent obtains a reward, we use the
in the same sensory input, L is the upper bound
following reinforcement function that satisfies
of conflicting rational rules, n is the number of
the rationality theorem of PS (theorem 1),
agents, and W and W0 are the upper bounds of the
reinforcement intervals for a direct and indirect-
1
fn = f , n = 1, 2, Wa − 1. (2) reward agents, respectively. ■
M n −1 The proof is presented in Appendix B.
Theorem 2 is derived by avoiding the least
In Equation (2), (f 0,Wa) is (R,W) for the desirable situation where expected reward per an
direct-reward agent and (μR, W0 (W0 £ W)) for action is zero. Therefore, if we use this theorem,
indirect-reward agents, where W and W 0 are we can expect multiple efficient aspects of indi-
reinforcement intervals for direct and indirect- rect rewards including improvement of learning
reward agents, respectively. For example, in speeds and qualities.
Figure 10, the weight of x1a1 and y 1a1 are re- We cannot know the number of L in general.
i n f o r c e d b y ωx a = ωx a + ( M1 )2 µR a n d However, in practice, we can set L=M-1. We can-
1 1 1 1
ωy a = ωy a + 1
µR , respectively. not know the number of W in general. However, in
1 1 1 1 M
practice, we can set μ=0, if a reinforcement interval
Rationality Theorem of PS in is larger than some number that is determined in
the Multi-agent Environments advance, or the assumption of the initial sensory
input for a goal-direct agent has been broken. If
In order to preserve the rationality in the multi- we set L=M-1 and W0 =W where indirect-reward
agent environments discussed in the previous agents have the same reinforcement interval of
section, all irrational rules in a goal-agent set must direct-reward agent, Equation (3) is simplified
be suppressed. On the other hand, if a goal-agent as follows;
set is constructed by the agents that all irrational
rules have been suppressed, we can preserve the 1
m= W
. (4)
rationality. Therefore, we can derive the necessary (M − 1)(n − 1)
and sufficient condition about the range of μ to
suppress all irrational rules in some goal-agent set.
274
Figure 11. Roulette-like environments
Application to Roulette- two agents learn the policy ‘move left in the initial
like Environments position’ or ‘move right in the initial position, and
move left in the right side of the initial position’, it
Setting is an irrational. When the optimal policy does not
have been destroyed in 100 episodes, the learning
Consider the roulette-like environments in Figure is judged to be successful. We will stop the learn-
11. There are 3 and 4 learning agents in the roulette ing if agent 0, 1 and 2 learn the policy ‘move left
a) and b), respectively. The initial position of an in the initial position’ or the number of actions is
agent i(Ai) is Si. The number shown in the center larger than 10 thousand. Initially, we set W=3. If
of both the roulettes (from 0 to 8 or 11) is given the length of an episode is larger than 3, we set
to each agent as a sensory input. There are two μ=0. From Equation (4), we set μ<0.0714… for
actions for each agent; move right (20% failure) the roulette a) and μ<0.0333… for the roulette b)
or move left (50% failure). If an action fails, the to preserve theorem 2.
agent cannot move. There is no situation where
another agent gets the same sensory input. At each Results and Discussion
discrete time step, Ai is selected based on the
selection probabilities Pi (Pi > 0, ∑ i =1 Pi = 1) .
n
We show the quality, that is evaluated by acquiring
times of an irrational or the optimal policies in a
(P0,P1,P2) is (0.9,0.05,0.05) for the roulette a),
thousand different trials where random seeds are
and (P0,P1,P2,P3) is (0.72,0.04,0.04,0.2) for the
changing, and the speed, that is evaluated by total
roulette b). When Ai reaches the goal i(Gi), Pi sets
action numbers to learn a thousand the optimal
0.0 and Pj (j≠i) are modified proportionally.
policies in Figure 12. Figure 12a) and 1b) are
When Ar reaches Gr on condition that Ai (i≠R)
the results of the roulette a) and b), respectively.
have reached Gi, the direct reward R(=100.0) is
Figure 13a) and b) are details of the speeds in the
given to AR and the indirect rewards 𝜇R are given
roulette a) and b), respectively.
to the other agents. When some agent obtains the
Though theorem 2 satisfies the rationality, it
direct reward or Ai reaches Gj (j≠i), all agents
does not guarantee the optimality. However, in
return to the initial position shown in Figure 11.
both the roulettes, the optimal policy always has
The initial weights for all rules are 100.0.
been learned beyond the range of theorem 2.
If all agents learn the policy ‘move right in
any sensory input’, it is the optimal. If at least
275
Figure 12. The results of the roulette-like environments in Figure 11
Figure 13. Details of the learning speeds in the roulette a) and b)
In the roulette a), μ=0.3 makes the learning the other hand, if we set μ≥0.3, there is a case that
speed the best (Figure 12a)), Figure 13a)). On irrational policies have been learned. It is an
the other hand, if we set μ≥0.4, there is a case important property of the indirect reward that the
that irrational policies have been learned. For learning qualities exceed those of the case of μ=0.
example, consider the case that A0, A1 and A2 in Though theorem 2 only guarantees the rational-
the roulette a) get three rule sequences in Figure ity, numerical examples show that it is possible to
14. In this case, if we set μ=1.0, A0, A1 and A2 improve the learning speeds and qualities.
approach to G2,G0 and G1, respectively. If we set
μ<0.0714..., such irrational policies do not have
been learned. Furthermore, we have improved the ExPANSION OF PS TO POMDPS
learning speeds. Though it is possible to improve
the learning speeds beyond the range of theorem 2, Approaches to POMDPs
we should preserve it to guarantee the rationality
in any environment. The traditional approach to POMDPs is the
In the roulette b), A3 cannot learn anything memory-based approach (Chrisman, 1992; Ma-
because there is no G3. Therefore, if we set μ=0, Callum, 1995; Boutilier et al., 1996) that uses
the optimal policy does not have been learned the history of sensor-action pairs or a model to
(Figure 12b)). In this case, we should use the identify the environmental states corresponding a
indirect reward. Figure 12b) and Figure 13b) show partially observable (PO) sensory input. Although
that μ=0.2 makes the learning speed the best. On the memory-based approach can attain the opti-
276
Figure 14. An example of rule sequences in the roulette a)
mality, it is hardware intensive since it requires If the agent selects action-a in S0 and S1, it can
a huge memory. obtain a reward in 3 steps, that is the minimum
To resolve the problem using the memory- number of steps required to obtain a reward. On
based approach, a stochastic policy (Singh et al., the other hand, if the agent selects the action-b
1994) is proposed, where the agent selects an in S0, it requires 4 steps to obtain a reward that
action based on the non-zero probability of every is the same as RA. When we improved RA using
action in every sensory input in order to escape the the Stochastic Gradient Ascent (SGA) (Kimura et
PO sensory inputs. The simplest stochastic policy al., 1995), that is a type of hill-climbing methods,
is the random walk (RA) that assigns the same the average number of steps required to obtain a
probability to every action. On the other hand, reward was 3.78 in 100 trials. Although 25 trials
the existing RL systems of learning a stochastic in 100 trials were able to improve RA, 73 trials
policy (Williams, 1992; Jaakkola et al., 1994; were the same as RA and the other 2 trials resulted
Kimura et al., 1995; Baird et al., 1999; Konda et in a deteriorated RA.2
al., 2000; Sutton et al., 2000; Aberdeen and Baxter, The hill-climbing methods that were used
2002; Perkins, 2002) are types of hill-climbing previous RL systems resulted in the stochastic
methods. They are often used in POMDPs since policy function being learned well in many cases.
they can attain a local optimum. However, they However, once they converged to a local optimum
cannot always improve RA. Furthermore, we i.e., a policy worse than RA, it is not possible for
know of a case where they change for a policy them to improve it. This implies that they change
worse than RA. for a policy worse than RA. To avoid the fault,
For example, in Figure 15, the average num- we focus on PS that belongs in XoL and is not a
ber of steps required obtain a reward by RA is 4. hill-climbing approach.
277
Figure 15. The environment demonstrating the limit of SGA
Figure 16. The PS-r algorithm
The PS-r Algorithm points. If a rule is regarded as rational, r of the

rule is updated to 1, which means that is a rational
Proposition of PS-r rule. Once r is set to 1, this value is maintained
throughout. While learning is in progress, an action
The rationality theorem of PS guarantees rational- is selected based on RA. Therefore, in practice,
ity in which there is no type 2 confusion. In the PS-r can determine all rational rules. When we
first, we propose PS-r to analyze the behavior of have evaluated or utilized a policy that is learned
PS in which there is a type 2 confusion. by PS-r termed Policy(PS-r), an action is selected
The principal objective of the rationality theo- based on the roulette selection in proportion to r.
rem is determining the rational rules. This can be The rationality theorem of PS is always rein-
implemented by the 1st memory that is an array forced by a rational rule as opposed to an irratio-
whose length is the number of sensory inputs. nal rule in the same sensory input. On the other
After an action is selected, the action is described hand, PS-r is an abstract algorithm of PS where-
in the current sensory input on the 1st memory. If in the values of all rules are evaluated by r or 1.
the agent obtains a reward, the contents of the 1st
memory are rational rules. PS-r is an algorithm to Properties of PS-r
learn a rational policy by the 1st memory.
The PS-r algorithm is shown in Figure 16. At If there is no type 2 confusion in an environment,
the beginning of learning, all rules are regarded it is evident that the average number of steps
as irrational and are initialized by r (0< r < 1) required to obtain a reward by Policy(PS-r) is
278
not larger than that of RA. On the other hand, we however, it is also possible for them to deteriorate
can derive the following theorem in the POMDPs it. On the other hand, we do not have to select an
where there is a type 2 confusion and the number irrational rule in the non-PO ( Ø PO) sensory in-
of types of a reward is one. puts. Therefore, we only select a rational rule in
them and follow a stochastic policy in the other
Theorem 3 (Comparison between sensory inputs. In particular, we use RA as the
PS-r and RA in the POMDPs) stochastic policy to avoid change for a policy
worse than RA.
The maximum value of the average number of To implement the above idea, it is important
steps to obtain a reward by Policy(PS-r) divided to judge whether sensory inputs are PO. If a sen-
by that of RA in the POMDPs where the number sory input is Ø PO, each transition probability to
of types of a reward is one is given by one of the following sensory inputs by a rule that
has been selected on the sensory input converges
M −1 n to some constant value, even if an action is se-
(1 + )
r lected based on different policies. On the other
r (5)
Mn hand, if a sensory input is PO, the transition prob-
ability will be changed depending on the policy
where n is the upper bound of the number of dif- that is used to reach the sensory input. We aim at
ferent environmental states that are sensed as the judging whether sensory inputs are PO, by com-
same sensory input. ■ paring them with the transition probabilities be-
The proof is presented in Appendix C. tween RA and the other policy.
Theorem 3 is derived from the worst case, The comparison is executed using the χ2-
where an environment is constructed by the most goodness-of-fit test. It only requires transition
difficult environmental structure termed the struc- probabilities to the following sensory inputs by
ture W (see Figure 24 in Appendix C) only. all rules. Therefore, it requires a memory of
Therefore, if there is no structure W in an envi- O(MN2), where M and N are the numbers of ac-
ronment, the behavior of Policy(PS-r) will be tions and sensory inputs, that is less than previous
better than that estimated using Equation (5). memory-based approaches. After the test of a rule
Furthermore, when there is no type 2 confusion based on each RA, and provided the other policy
in some part of an environment and there is an enough sampling, if a transition probability to one
irrational rule in it, its behavior will increasingly of the following sensory inputs is not coincident
improve. between RA and the other policy, we can regard
the sensory input in which the rule can be se-
From PS-r to PS-r* lected as PO. Otherwise, if all the transition prob-
abilities are coincident, the sensory input can be
Improvement of PS-r to fit on POMDPs regarded as Ø PO. It should be noted that although
it is possible not to determine a part of the PO
If we do not identify the environmental states that sensory inputs, it could be resolved by changing
correspond to a PO sensory input, we should use the policy that has been compared with RA.
a stochastic policy to escape from the PO sen- In general, we require several actions in order
sory inputs. Although the simplest stochastic to execute a correct χ2-goodness-of-fit test. When
policy is RA, the existing RL systems to learn a we set the significant level and detection power
stochastic policy cannot always improve RA; as a and 1 − β, respectively, the number of ac-
tions (n) required to achieve the correct test re-
279
garding a transition by a rule can be statistically In the test mode, if the agent senses an un-
estimated by the following criteria; known sensory input, it returns to the learning
mode. Otherwise, an action is selected based on
 
2
Policy(PS-r*(test)) that is the policy learned in the
1  u(α) + u(2β )  learning mode. Usually, an action is selected by the
n =   (6)
2  sin −1
π1 − sin −1
π2  roulette selection in proportion to r in Policy(PS-
r*(test)). However, we can use another policy
such that some action is not selected to improve
where π1 and π2 are transition probabilities of the
the accuracy of the χ2-goodness-of-fit test.
following sensory input by the rule when RA and
If NofR(test) for a rule that can be selected by
the other policy are used, respectively. In addition,
the sensory inputs, where they do not decide
u()is derived using a normal distribution table,
whether PO or Ø PO is larger than CV, the χ2-
for example, by setting a = 0.05 and β = 0.10, u(
goodness-of-fit test between NofT(learning) and
a ) = 1.960 and u(2β)=1.282.
NofT(test) for the rule is executed. If the result of
the test indicates that they are not coincident, the
The PS-r* Algorithm
sensory input that can select the rule is PO. Oth-
erwise, a Ø PO-judge flag for the rule will be
We propose PS-r* to implement the above idea.
raised. Subsequently, if all the Ø PO-judge flags
We show the algorithm in Figure 17. It is mainly
for rules that can be selected on the same sen-
divided into the learning mode and the test mode.
sory input are raised, the sensory input is Ø PO.
It requires the following five types of memory:
PS-r* is stopped when all NofR(test) are larger
the 1st memory to determine rational rules that
than CV.
are the same as PS-r, Ø PO-judge flags to judge
When we have evaluated or utilized a policy
whether sensory inputs, whose length are the
that is learned by PS-r* termed Policy(PS-r), an
number of a rule, are PO, PO flags to store the
action is selected based on the roulette selection
result of the judgment (PO/ Ø PO/unknown) re-
in proportion to r, where r is set to 0 if it is less
garding the sensory inputs that are unknown
than 1 in the Ø PO sensory inputs. Therefore,
during initialization, the number of ways of select-
Policy(PS-r*) is coincident to RA if all the sen-
ing each rule in two modes (NofR(learning) and
sory inputs are PO. On the other hand, if there
NofR(test)) that requires a memory of O(MN),
exist several irrational rules in the Ø PO sensory
and the number of transitions to each following
inputs, Policy(PS-r*) increasingly better than RA.
sensory input by each rule in two modes
(NofT(learning) and NofT(test)) that requires a
Features of PS-r*
memory of O(MN2).
PS-r* starts from the learning mode. In the
Policy(PS-r*) is coincident to RA in the PO sensory
learning mode, the agent selects an action based
inputs. Therefore, if all sensory inputs are PO, it
on RA to determine all rational rules. Rational
is the most difficult task for PS-r*.
rules are determined using the same algorithm as
On the other hand, Policy(PS-r*) selects a
PS-r. If all NofR(learning) are larger than CV that
rational rule in the Ø PO sensory inputs. Therefore,
is calculated by the upper bound of Equation (6),
if there exist many irrational rules in the Ø PO
the mode is changed to the test one. If we set a
sensory inputs, Policy(PS-r*) is increasingly bet-
= 0.05, β = 0.10, and max|π1−π2 |=0.05, which
ter than RA. They are very important properties
means that the maximum error of estimation of a
of PS-r* that are not guaranteed in PS-r and the
transition probability is 0.05, CV is 2059.09.
280
Figure 17. The PS-r* algorithm
existing RL systems when learning a stochastic this environment, the different environmental
policy. states Za,Zb,Zc, and Zd are sensed as the same
PS-r* requires a memory of O(MN2). It is larger sensory input Z. In addition, it adopts the structure
than PS-r, that only requires a memory of O(MN). W , that is the most difficult environmental struc-
However, this value is much smaller than those ture, in the sensory input Z. After the agent selects
of previous memory-based approaches. action-a in sensory input X, it moves the sensory
inputs S1 and X with p and 1-p probabilities, re-
Application to the Most Difficult spectively. States from S1 to Sn are sensed as n
Environmental Structure different sensory inputs. If we adjust n and p, the
average number of steps required to obtain a re-
Setting ward can be changed.
We compare PS-r* with RA, PS-r, and SGA in

the environment that is shown in Figure 18. In
281
Figure 18. The environment in which PS-r* is compared with RA, PS-r, and SGA
Comparison between RA and PS-r standard deviation of Policy(PS-r*) and SGA in

the left side of Table 1. We have set n=7, n=14,
We can calculate the average number of steps and n=21. Furthermore, we list the speed, that
required to obtain a reward by RA and Policy(PS- is the average number of steps required to reach
 1   the quality and its standard deviation in the right
r) as  + 1 + (2n ) + (16) and
 p   side of Table 1.
  
PS-r* is better than SGA with regards to the
  4 
 1  (1 + r )  quality, though it is worse than SGA in the speed.
 + r  + ((1 + r )n ) +   , r e s p e c -
 p   r 3  If n is larger, the quality of Policy(PS-r*) is much
   better than SGA. This is because SGA selects an

t i v e l y. T h e r e f o r e , i f w e s e t irrational rule at the sensory input Si that does not
   need a stochastic policy. On the other hand, an
(1 + r )
4
 1 
n >  + (r − 17) , Policy(PS-r) irrational rule is never selected by PS-r* at the
 1 − r  r 3

   sensory inputs. It is a very important property of

is better than RA. PS-r* in comparison with the previous RL systems
Policy(PS-r*) coincides with RA when the to learn a stochastic policy.
χ -goodness-of-fit test is not executed. If the test
2
is completed, the average number of steps required

1  CONCLUSION
to obtain a reward for Policy(PS-r*) is  + n + 16
p 
  In this chapter, we have proposed Exploitation-
, that is better than both RA and PS-r. oriented Learning (XoL) that is a new approach
to goal-directed learning from interaction. XoL
Comparison with SGA does not require a sophisticated design of the value
of a reward signal and aims to pursue the ratio-
Next, we compare PS-r* with SGA empirically. nality that can obtain a reward very quickly. We
The learning parameter and discounted rate of have focused on the Partially Observed Markov
SGA are 0.1 and 0.99, respectively. The initial Decision Processes (POMDPs) environments and
policy given to SGA is RA. In PS-r*, since we classified these difficulties into a type 1 and type 2
set a = 0.05, β = 0.10, and max|π1−π2 |=0.05, CV confusions. We have defined that a type 2 confu-
is 2095.09. sion is the most difficult property in POMDPs.
We list the quality, that is the average num-
ber of steps required to obtain a reward and its
282
Table 1. The results of the comparison with PS-r* and SGA in Figure 18
The Quality The Speed

n Ave. S.D. Ave. S.D.
7 24.2 0.218 2.38×10 4
4.10×104
PS-r* 14 31.2 0.237 1.73×105 2.48×104
21 38.2 0.237 2.19×10 5
6.77×104
7 26.3 6.83 2.21×103 2.91×103
SGA 14 38.0 10.9 4.27×103 4.67×103
21 50.8 9.47 4.42×10 3
5.47×103
For the POMDPs environments where there is Aberdeen, D., & Baxter, J. (2002). Scalable Inter-
no type 2 confusion and the number of types of nal-State Policy-Gradient Methods for POMDPs.
a reward is one, we have proved the Rationality In Proceedings of the Nineteenth International
Theorem of Profit Sharing (PS). Next, we have Conference on Machine Learning (pp. 3-10).
proved the Rationality Theorem of PS in multi-
Baird, L., & Poole, D. (1999). Gradient Descent
agent environments. Last, we have analyzed the
for General Reinforcement Learning. Advances
behavior of PS-r, that is an abstract algorithm
in Neural Information Processing Systems, 11,
of PS, in the POMDPs. Furthermore, we have
968–974.
proposed PS-r*, that is an extended algorithm of
PS-r, to fit on the POMDPs environments where Boutilier, C., & Poole, D. (1996). Computing
there is a type 2 confusion and the number of Optimal Policies for Partially Observable Deci-
types of a reward is one. sion Processes using Compact Representations. In
We have shown several numerical examples Proceedings of the Thirteenth National Conference
to support how to use these XoL methods. Also, on Artificial Intelligence (pp. 1168-1175).
we have shown that the performance of PS-r*
Chrisman, L. (1992). Reinforcement Learning
is not less than that of random walk (RA) and it
with Perceptual Aliasing: The Perceptual Dis-
exhibits exceptional potential to improve RA using
tinctions Approach. In Proceedings of the Tenth
a lower memory than the previous memory-based
National Conference on Artificial Intelligence
approaches.
(pp. 183-188).
Our future projects include: improving RA
in PS-r*, extending XoL to multi-dimensional Gosavi, A. (2004). A Reinforcement Learn-
reward and penalty environments and discovering Algorithm Based on Policy Iteration for
ing efficient real-world applications, and so on. Average Reward: Empirical Results with
Yield Management and Convergence Analysis.
Machine Learning, 55, 5–29. doi:10.1023/
REFERENCES B:MACH.0000019802.64038.6c
Abbeel, P., & Ng, A. Y. (2005). Exploration and Grefenstette, J. J. (1988). Credit Assignment
apprenticeship learning in reinforcement learning. in Rule Discovery Systems Based on Genetic
In Proceedings of the Twentyfirst International Algorithms. Machine Learning, 3, 225–245.
Conference on Machine Learning (pp. 1-8). doi:10.1007/BF00113898
283
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1994). Miyazaki, K., & Kobayashi, S. (2001). Rational-
Reinforcement Learning Algorithm for Partially ity of Reward Sharing in Multi-agent Reinforce-
Observable Markov Decision Problems. Advances ment Learning. New Generation Computing, 91,
in Neural Information Processing Systems, 7, 157–172. doi:10.1007/BF03037252
345–352.
Miyazaki, K., & Kobayashi, S. (2003). An Ex-
Kimura, H., Yamamura, M., & Kobayashi, S. tension of Profit Sharing to Partially Observable
(1995). Reinforcement Learning by Stochastic Markov Decision Processes: Proposition of PS-r*
Hill Climbing on Discounted Reward. In Proceed- and its Evaluation. [in Japanese]. Journal of the
ings of the Twelfth International Conference on Japanese Society for Artificial Intelligence, 18(5),
Machine Learning (pp. 295-303). 286–296. doi:10.1527/tjsai.18.286
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor- Miyazaki, K., Yamaumra, M., & Kobayashi, S.
Critic Algorithms. Advances in Neural Informa- (1994). On the Rationality of Profit Sharing in
tion Processing Systems, 12, 1008–1014. Reinforcement Learning. In Proceedings of the
Third International Conference on Fuzzy Logic,
Liepins, G. E., Hilliard, M. R., Palmer, M., &
Neural Nets and Soft Computing (pp. 285-288).
Rangarajan, G. (1989). Alternatives for Classi-
fier System Credit Assignment. In Proceedings Ng, A. Y., Harada, D., & Russell, S. J. (1999).
of the Eleventh International Joint Conference on Policy Invariance Under Reward Transforma-
Artificial Intelligent (pp. 756-761). tions: Theory and Application to Reward Shap-
ing. In Proceedings of the Sixteenth International
McCallum, R. A. (1995). Instance-Based Utile
Conference on Machine Learning (pp. 278-287).
Distinctions for Reinforcement Learning with
Hidden State. In Proceedings of the Twelfth Ng, A. Y., & Russell, S. J. (2000). Algorithms for
International Conference on Machine Learning Inverse Reinforcement Learning. In Proceedings
(pp. 387-395). of the Seventeenth International Conference on
Machine Learning (pp. 663-670).
Merrick, K., & Maher, M. L. (2007). Motivated
Reinforcement Learning for Adaptive Characters Perkins, T. J. (2002). Reinforcement Learning for
in Open-Ended Simulation Games. In Proceedings POMDPs based on Action Values and Stochastic
of the International Conference on Advanced in Optimization. In Proceedings of the Eighteenth
Computer Entertainment Technology (pp. 127- National Conference on Artificial Intelligence
134). (pp. 199-204).
Miyazaki, K., & Kobayashi, S. (1998). Learning Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994).
Deterministic Policies in Partially Observable Learning Without State-Estimation in Partially
Markov Decision Processes. In Proceedings of Observable Markovian Decision Processes. In
the Fifth International Conference on Intelligent Proceedings of the Eleventh International Confer-
Autonomous System (pp. 250-257). ence on Machine Learning (pp. 284-292).
Miyazaki, K., & Kobayashi, S. (2000). Reinforce- Sutton, R. S. (1988). Learning to Predict by the
ment Learning for Penalty Avoiding Policy Mak- Methods of Temporal Differences. Machine
ing. In Proceedings of the 2000 IEEE International Learning, 3, 9–44. doi:10.1007/BF00115009
Conference on Systems, Man and Cybernetics
(pp. 206-211).
284
Sutton, R. S., & Barto, A. (1998). Reinforcement ENDNOTES

Learning: An Introduction. Cambridge, MA:
MIT Press.
1
Remark that the highest state value is 1 that
is assigned in state 2.
Sutton, R. S., McAllester, D., Singh, S. P., & 2
Theoretically, the performance of SGA is
Mansour, Y. (2000). Policy Gradient Methods not less than that of RA; however, in prac-
for Reinforcement Learning with Function Ap- tice, the property does not always hold true
proximation. Advances in Neural Information since some assumption that is required by
Processing Systems, 12, 1057–1063. the theory has been broken.
Watkins, C. J. H., & Dayan, P. (1992). Technical
note: Q-learning . Machine Learning, 8, 55–68.
doi:10.1023/A:1022676722315
Williams, R. J. (1992). Simple Statistical Gra-
dient Following Algorithms for Connectionist
Reinforcement Learning. Machine Learning, 8,
229–256. doi:10.1007/BF00992696
285
APPENDIx A: PROOF OF ThEOREM 1
We derive the necessary and sufficient condition to suppress irrational rules in the POMDPs where there
is no type 2 confusion and the number of types of a reward is one. In the first, we consider the local
rationality that can suppress any irrational rule (Theorem A.1). It is derived by two lemmas (Lemma
A.1 and A.2). We characterize a conflict structure where it is the most difficult to suppress irrational
rules (Lemma A.1). For two conflict structures A and B, we say A is more difficult than B when the
class of reinforcement functions that can suppress any irrational rule of A is included in that of B. Then,
we derive the necessary and sufficient condition to suppress any irrational rule for the most difficult
conflict structure (Lemma A.2).
Lemma A.1 (The Most Difficult Conflict Structure)
The most difficult conflict structure has only one irrational rule with a self-loop.
Figure 19 show the most difficult conflict structure where only one irrational rule with a self-loop
conflicts with L rational rules.
Proof of lemma A.1
Although we set L=1, we can easily extend it to any number. Reinforcement of an irrational rule makes it
difficult to learn a rational policy under any reinforcement function. Therefore, the difficulty of a conflict
structure varies monotonically with the number of reinforcements for irrational rules. We enumerate
conflict structures according to the branching factor b, that is the number of state transitions in the same
sensory input, the conflict factorc, that is the number of conflicting rules in it, and examine the number
of reinforcements for irrational rules.
b=1: It is clearly not difficult since there are no conflicts (Figure 20a).
b=2: When there are no conflicts (Figure 20b), it is the same as b=1. We divide the structures of c=2
into two subclasses. One contains a self-loop (Figure 20c), and the other does not it (Figure 20d). In the
case given in Figure 20c, there is a possibility that the self-loop rule is repeatedly selected, while the
non-self-loop rule is selected at most once. Therefore, if the self-loop rule is irrational, it will be rein-
forced more than the irrational rule of Figure 20d.
b ³ 3: When there are no conflicts (Figure 20e), it is the same as b=1. Consider the structure of c=2
(Figure 20f). Although the most difficult case is that the conflict structure has an irrational rule as a
self-loop, even such a structure is less difficult than Figure 20c. Considering the structure of c=3 (Figure
20g), two of the conflict rules are irrational. Therefore, the expected number of reinforcement for one
irrational rule is less than that of Figure 20f.
Similarly, conflict structures of b>3 are less difficult than Figure 20c.
From the above discussion, it is concluded that the most difficult conflict structure is expressed in
Figure 20c. Q.E.D.
Lemma A.2 (Suppressing Only One Irrational Rule with a Self-Loop)
Only one irrational rule with a self-loop can be suppressed if and only if suppression conditions hold.
286
Figure 19. The most difficult conflict structure
Figure 20. Conflict structures used in the proof
Proof of lemma A.2
Although we set L=1, we can easily extend it to any number. If the rational rule of the most difficult
conflict structure is reinforced by fN, the value of the reinforcement for the conflicting irrational rule
becomes maximal when it has been selected W-N times before the selection of the rational rule. Subse-
quently, the weight of the irrational rule is increased by fN+1 + ⋯ + fW, and that of the rational rule by fN.
From a viewpoint of rationality, the increased weight of the rational rule must be larger than that of the
irrational rule. Such a condition must hold for any later part of the reinforcement interval. Therefore,
suppression conditions are necessary. The sufficiency is evident. Q.E.D.
Using the law of transitivity, the following theorem is directly derived from these lemmas.
Theorem A.1 (Irrational Rules Suppression Theorem)
Any irrational rule can be suppressed if and only if suppression conditions hold.
Theorem A.1 guarantees that the local rationality in reinforcement learning will suppress any ir-
rational rule. A policy that is constructed by rational rules satisfies local rationality. However, we can
construct an example such that the policy is irrational. Next, we discuss the global rationality that can
learn a rational policy.
In the global rationality, the necessity of suppression conditions is evident. We investigate the suf-
ficiency (Lemma A.3).
287
Figure 21. The environment used in the proof
Lemma A.3 (Sufficiency of Suppression Conditions)
If PS satisfies suppression conditions, it can learn a rational policy in the POMDPs where there is no
type 2 confusion and the number of types of a reward is one.
Proof of lemma A.3
Although we set L=2, we can easily extend it to any number. If a policy is irrational, the policy has to
contain a rewardless loop that does not include any rewards. We need at least two episodes to construct
a rewardless loop. Furthermore, the following inequalities must be maintained in sensory inputs x and
y, that are the exits in the loop of Figure 21,
wxo <wxi ,wyo <wyi (7)
where xo and yo are rules that exit the loop, xi and yi are rules that enter the loop, and ∆ represents the
total reinforcement to be accumulated in a rule.
If the episode with xi contains xo and the irrational rule suppression theorem is satisfied, from theo-
rem A.1, wxo >wxi . Therefore, the episode with xi requires the rule that is not xo to exit the loop.
This also applied to yi. The following inequalities are derived using theorem A.1.
wyo >wxi +wyi , wxo >wyi +wxi (8)
The following inequality is derived using Equation (7) and (8).
wxi +wyi >wxo +wyo > 2(wxi +wyi ) (9)
There is no solution to satisfy the inequality as ∆w > 0 . Then, we cannot construct a rewardless
loop. Therefore, if we use reinforcement functions that satisfy suppression conditions, we can always
obtain a rational policy. Q.E.D.
Theorem 1 is directly derived from this lemma. Q.E.D.
288
APPENDIx B: PROOF OF ThEOREM 2
First, we derive the necessary and sufficient condition to suppress any irrational rule for the most dif-
ficult conflict structure. If it can be derived, we can extend it to any conflict structure by using the law
of transitivity.
From lemma A.1, the most difficult conflict structure has only one irrational rule with a self-loop. In
this structure, we can derive the following lemma.
Lemma B.1 (Suppressing Only One Irrational Rule with a Self-Loop)
Only one irrational rule with a self-loop in some goal-agent set can be suppressed if and only if Equa-
tion (3)
Proof of lemma B.1
For any reinforcement interval k (k=0,1,…,W-1) in some goal-agent set, we show that there is j (j=1,2,…
,L) satisfying the following condition,
wijk > wik0 (10)
where wijk and wiok are weights of jth rational rule (rijk ) in agent i (i=0,1,…,n-1) and only one irrational
rule with a self-loop (rik0 ) in the agent, respectively (Figure 22).
First, we consider the ratio of the selection number of (rijk ) to (rik0 ) . When n’=1 (the number of the
goal-agent set is one) and L rational rules for each agent are selected by all agents in turn, the minimum
of the ratio is maximized (Figure 23). In this case, the following ratio holds (Figure 24),
rijk : rik0 = 1 : (n − 1)L (11)
Second, we consider weights given to (rijk ) and (rik0 ) . When the agent that obtains the direct reward
R
senses no similar sensory input in W, the weight given to (rijk ) is minimized. It is in k=W. On
M W -1
the other hand, when agents that obtain the indirect reward sense the same sensory input in W, the weight
 W0 
M   1  
given to ri 0 is maximized. It is mR 1 −    in W≥W0.
k
M − 1   M  
Figure 22. The most difficult multi-agent structure
289
Figure 23. The rule sequence:

(r01k , r10k , , rnk-1 0 , r02k , r10k , , rnk-1 0 , , r0kL , r10k , , rnk-1 0 , r00k , r11k , , rnk-1 0 , r00k , r12k , , rnk-1 0 ,
, r00k , r1kL , , rnk-1 0 , , r00k , r10k , , rnk-1 1, r00k , r10k , , rnk-1 2 , , r00k , r10k , , rnk-1 L )
or all agents on some k. For example, if ‘x changes to O1’ or ‘O2 changes to O1’ in this figure, the
learning in the agent that can select the changing rule occurs more easily
Figure 24. Sample rule sequences at n=3 and L=3. Though the sequence 1 has some partial selection
of rules, the sequence 2 does not have it. The sequence 2 corresponds to Figure 23. Sequence 1 is more
easily learned than sequence 2 as discussed on Figure 23.
Therefore, it is necessary for satisfying condition Equation (10) to hold the following condition,
 W0 
R M   1  
> mR 1 −   (n − 1)L  , (12)
M W −1 M − 1   M  

that is,
M −1
m< . (13)
  W0 
W  1
M 1 −   (n − 1)L 
  M  
 
It is clearly the sufficient condition. Q.E.D.

Theorem 2 is directly derived from this lemma. Q.E.D.
290
APPENDIx C: PROOF OF ThEOREM 3
First, we show the most difficult environmental structure, where the average number of steps required
to obtain a reward by Policy(PS-r) is the worst in comparison with RA (Lemma C.1). Next, we analyze
its behavior in the structure (Lemma C.2). If it can be derived, we can extend it to all the classes of the
POMDPs where the number of types of a reward is one.
Lemma C.1 (The Most Difficult Environmental Structure)
The most difficult environmental structure is an environment that is shown in Figure 25 where there
exist M−1 actions that are the same as action-b. We term it as structure W .
Proof of lemma C.1
If all rules are rational, there is no difference between PS-r and RA. Therefore, we treat the case as one
where there is an irrational rule.
(i) M=2
In structure W , the rules that are constructed by action-a and b are regarded as irrational and rational,
respectively. If we select action-a in the state that is not state B, we can approach a reward. On the
other hand, if we select action-b in its states, we move to state A, that is the furthest from a reward.
Therefore, in structure W at M=2, if we select a rational rule, the number of steps required to obtain a
reward will be larger. In a structure that has a lesser effect than the case of structure W , the difference
between PS-r and RA reduces. Therefore, in the case of M=2, structure W requires the largest number
of steps to obtain a reward by PS-r in comparison with that of RA.
(ii) M>2
Figure 25. The most difficult environmental structure: structure W
291
At first, we consider the case where the other irrational rules are added to structure W at M=2. If the
selection probabilities of their rules are not zero, the average number of steps required to obtain a reward
1
should be larger than that of structure W at M=2. In RA, the selection probability of the rule is . On
M
1
the other hand, in PS-r, it is the less than . Therefore, if we compare the same structure, when the
M
other irrational rules are added, the difference between RA and PS-r reduces.
Next, we consider the case where the other rational rules are added to structure W at M=2. In this
case, when all the other rules are the same as the rule that is constructed by action-b, that is the largest
average number of steps required to obtain a reward by PS-r in comparison with RA.
Therefore, structure W requires the largest number of steps to obtain a reward by PS-r in comparison
with RA. Q.E.D.
Lemma C.2 (Comparison Between PS-r and RA in Structure Ω)
In structure W , the maximum value of the average number of steps required to obtain a reward by
Policy(PS-r) divided by that of RA is given by Equation (5)
Proof of lemma C.2
1
In structure W , the average number of steps required to obtain a reward (Va) is Va = where
s(1 − s )n −1
s is the selection probability of action-b where there exist the same M-1 actions. We can calculate
M −1 M −1
s= and for RA and PS-r, respectively. Therefore, calculating Va] for each s, we
M (M − 1) + r
n
 
1 + M − 1
 r 
can get the rate r . Q.E.D.
Mn
Theorem 3 is directly derived from these lemmas. Q.E.D.
292
Section 7
Miscellaneous
294
Chapter 16
Pheromone-Style
Communication for
Swarm Intelligence
Hidenori Kawamura
Hokkaido University, Japan
Keiji Suzuki
Hokkaido University, Japan
ABSTRACT
Pheromones are the important chemical substances for social insects to realize cooperative collective
behavior. The most famous example of pheromone-based behavior is foraging. Real ants use pheromone
trail to inform each other where food source exists and they effectively reach and forage the food. This
sophisticated but simple communication method is useful to design artificial multiagent systems. In this
chapter, the evolutionary pheromone communication is proposed on a competitive ant environment model,
and we show two patterns of pheromone communication emerged through co-evolutionary process by
genetic algorithm. In addition, such communication patterns are investigated with Shannon’s entropy.
INTRODUCTION Swarm behavior is the aggregation of such local

interactions. The usability of multi-agent systems
Swarm intelligence is a type of artificial intel- and swarm intelligence is well-known in various
ligence based on the collective behavior of applications in robotics, optimization problems,
decentralized, self-organized systems (Dorigo distributed computing, Web services, and mobile
& Theraulaz, 1999). The knowledge and informa- technologies. The main topic of such applications
tion processors in swarm intelligence are widely has always been how to design local agents that
decentralized in parts of the system, which are will emerge to demonstrate sophisticated global
called agents. All agents basically have a decision- behavior. We can only design local behavior by
making mechanism obtained from local knowl- implementing agents, although we need sophis-
edge, by local-information processing, and from ticated global behavior. To obtain good designs,
communication channels with cooperative agents. we must understand the relationship between local
design and emerging results.
DOI: 10.4018/978-1-60566-898-7.ch016
Pheromone-style Communication for Swarm Intelligence
One good way of introducing such relation- Nasanov pheromones for gathering their mates,
ships is provided by nature. Real ants and bees and queen pheromones as a signal to indicate the
are called social insects. Their colonies consist queen is alive.
of many members who attend to various jobs to A good example enabling the relationship
preserve the life of each colony (Sheely, 1995). The between pheromone communication and swarm
sizes of colonies are much too large for members, intelligence to be understood is the foraging
even queens, to comprehend all activities and behavior of real ants. In the first stage of typi-
information. In other words, although individual cal ant-foraging behavior, scouting worker ants
abilities to assess the condition of each colony are individually begin searching for routes from
limited, they can still do their work based only the nest in random directions. When a scouting
on this limited information. The total activities of worker discovers a food source along the route,
colonies, e.g., defense against enemies, repairing it picks up the food and brings it back to the nest
nests, childcare, and foraging, emerge due to the while laying down a pheromone. Consequently,
aggregation of such individual behaviors. More- it releases the first pheromone trail on the ground
over, colonies must balance out the total work from the food source to the nest. The pheromone
appropriately according to changes in the situa- trail plays an important role in collective foraging
tion to optimize their operating costs. It is easy behavior. If other workers around the nest find
to see that communication between members is the pheromone trail, they try to follow it to arrive
very important to attend to the many matters that at the food source. These workers also discover
each colony requires. the food source and then return to the nest while
Many species of social insects not only have reinforcing the intensity of the pheromone trail.
direct- but also indirect-communication channels The intensity of the pheromone trail is succes-
that are equally important attained by using special sively reinforced by the large numbers of workers
chemical substances, which are called “phero- who continue to march to the food source until
mones.” (Agosta, 1992) A pheromone is a chemi- all the food is consumed. No ants reinforce the
cal that triggers a natural behavioral response in pheromone trail after they have removed all the
another member of the same species. When one food. The pheromone gradually evaporates from
member of an ant colony senses a particular in- the ground and the trail automatically dissipates
ternal condition or external stimulus, it responds into the air.
to such a situation and it releases a corresponding This type of sophisticated collective behavior
kind of pheromone into the environment. The can emerge due to the complex effect of local
pheromone is diffused through the environment decision-making, pheromone-communication
by natural characteristics, e.g., evaporation from channels, and natural characteristics of the envi-
the ground, diffusion in the air, and physical ronment. The mechanism based on such complex
contact between the members or the members effects is called “stigmergy” in the research area
and their enemies. By using the effect of such of ethology. In the pheromone mechanism of
translation from the sender to the environment, stigmergy, a member releases a pheromone into
the pheromone signal sends not only a message the environment, which interferences with its
from the sender but also information about the propagation, and the other members detect the
current environment. When the receiver senses pheromone from the environment. This com-
such a pheromone, it causes a particular reaction munication channel enables the entire colony to
in the receiver due to the natural characteristics of organize all of its members and achieve high-level
the species. One kind of honeybee handles over tasks that require coordination and decentraliza-
thirty types of pheromones, which include alarm tion between them (Dorigo & Theraulaz, 1999).
pheromones for warnings about enemy attacks,
295
To investigate what effect stigmergy has had Fernandez & Marin, 2000). Ando et al. succeeded
with artificial pheromones, some researchers have in predicting the future density of traffic conges-
tried to create models of swarming behavior. Col- tion by using an artificial pheromone model on a
lins et al. and Bennett studied the evolutionary road map (Ando, Masutani, Honiden, Fukazawa
design of foraging behavior with neural networks & Iwasaki, 2006). These researchers revealed that
and genetic algorithms (Collins & Jeffersion, artificial stigmergy with pheromone models are
1991; Bennett III, 1996). Nakamura et al. reported useful for applications to multi-agent systems.
the relationship between global effects and local This chapter focuses on the design of multi-
behavior in a foraging model with artificial ants agent systems to successfully complete given tasks
(Nakamura & Kurumatani, 1997). Suzuki et al. with collective behavior and artificial stigmergy.
demonstrated the possibility of pheromones solv- We particularly selected the evolutionary design
ing a deadlocked situation with food-scrambling in pheromone communication. If an agent has
agents (Suzuki & Ohuchi, 1997). These research- a specific advance rule to output a pheromone,
ers revealed the possibility that artificial stigmergy another agent should optimize its reaction to the
based on pheromone models could be used to pheromone to establish communication. How-
replicate swarm intelligence. ever, if an agent has a specific advance reaction
Some good applications of artificial stigmergy to a pheromone, another agent should optimize
to help solve these problems have been proposed. the situation as to when and where it outputs
One of the most successful examples is the ant the pheromone. The possibility of designing
colony optimization (ACO) algorithm proposed evolutionary-pheromone communication is not
by Dorigo et al. (Dorigo, Maniezzo & Colorni, so easy in these cases because it is necessary for
1991; Colorni, Dorigo & Maniezzo, 1991) ACO the communication to simultaneously specify
is a probabilistic technique for solving compu- the reaction to the pheromone and the situation
tational problems that can be reduced to finding to output it.
good paths through the use of graphs (Dorigo & Section 2 proposes an ant war as a competitive
Stutzle, 2004). In ACO, artificial pheromones environment for a task requiring effective agent
are introduced to each path to represent how a communication. Section 3 explains the evolution-
corresponding path is useful for constructing a ary design process for the agent system. Section
solution to a given computational problem. The 4 presents the setting for computer simulation,
series of ACOs perform well in solving various and Section 5 concludes the chapter by clarify-
computational problems. In other research, Sauter ing the evolutionary process to form artificial
et al. proposed the use of digital pheromones for pheromone communication from the viewpoint
controlling and coordinating swarms of unmanned of information theory.
vehicles, and demonstrated the effectiveness of
these pheromone algorithms for surveillance,
target acquisition, and tracking (Sauter, Mat- ANT WAR AS COMPETITIVE
thews, Parunak & Brueckner, 2005). Mamei et al. ENVIRONMENT
proposed a simple low-cost and general-purpose
implementation of a pheromone-based interaction An ant war is a competitive environment for two
mechanism for pervasive environments with RFID teams of ant-like agents (Kawamura, Yamamoto,
tags (Mamei & Zambonelli, 2007). Sole et al. dis- Suzuki & Ohuchi, 1999; Kawamura, Yamamoto
cussed that behavioral rules at the individual level & Ohuchi, 2001). The environment is constructed
with a pheromone model could produce optimal on 44X80 grids, and these two are for a blue-ant
colony-level patterns (Sole, Bonabeau, Delgado, and a red-ant team. Each team consists of 80
296
Figure 1. The outline of ant war competition. This figure has been reproduced by permission from
Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone Com-
munication Based on External Measurement and Emergence of Swarm Intelligence, Japanese Journal
of the Society of Instrument and Control Engineers, 37(5), 455–464.
homogeneous agents initially placed in random P (x, y, t + 1) = P (x, y, t) +

positions. Some food packs are initially placed on P (x − 1, y, t) + P (x + 1, y, t) +
 
the center line of the environment. The purpose 
of each team is to carry the food packs to its own ³ dif P (x, y − 1, t) + P (x, y + 1, t) − +
 
goal line and to collect more food packs than the 5P ( x, y, t) 

opposing team. The competition in the ant war is ³ eva T(x, y, t),
outlined in Figure 1.
The pheromone field for both teams in the
environment is defined with two variables T(x,
y, t) and P(x, y, t). T(x, y, t) is the intensity of the where γeva is the pheromone evaporation rate from
pheromone on the grid, (x, y), at time t, and P(x, the ground and γdif is the pheromone diffusion
y, t) is the aerial density of the pheromone on grid rate into the air. Q is the amount of pheromone
(x, y) at time t. Between times t and t + 1 these released by one ant agent. According to these
pheromone field variables are updated as equations the pheromone released by the ant
agent gradually evaporates from the ground and
T (x, y, t + 1) = (1 − ³ eva )i diffuses into the space.
T (x, y, t) + ∑Tk (x, y, t), Each agent has a limited 10-bit sensor, denoted
k as i1i2,…,i10, that can obtain information about
its neighboring grids. First, six bits, i1i2,…,i6, are
Q determined according to the six rules.
T k (x, y, t) =  ,
 0
 • Whether the agent touches a food pack or
not.
If agent k lays off the pheromone on grid (x,y) • Whether the agent contacts a mate who
at time t otherwise: tries to carry a food pack or not.
• Whether the agent contacts a mate in front
of it or not.
297
Figure 2. The firing probability of pheromone sensory inputs. This figure has been reproduced by
permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
Pheromone Communication Based on External Measurement and Emergence of Swarm Intelligence,
Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
• Whether the agent contacts a mate to its where T is the parameter of sensitivity. The firing
right or not. probability takes a higher value in responding to
• Whether the agent contacts a mate to its relatively high-density pheromones (see Figure 2).
left or not. Although the pheromone sensor only has binary
• Whether the agent contacts a mate behind values, agent k can react according to the density
it or not. of pheromones. Since the agent in this model
must determine its actions according to sensory
An additional four bits,i7i8i9i10, stochastically information only about the neighborhood, effec-
respond to the aerial densities of the pheromone tive communication through pheromone channels
on four neighboring grids. Here, let (x, y) be is important to win in this competition.
the position of agent k and At time t, each individual agent can select one
(x − 1, y ), (x + 1, y ), of seven actions, going forward, backward, left,
(x ′, y ′) ∈   be the right, standing by in the current position, pulling
(x, y − 1),(x, y + 1) 
  a food pack, or laying a pheromone in the current
sensing position of agent k. The firing probabil- position. If the agent wants to pull a food pack
ity of each pheromone bit is defined as and actually apply force to the target food, it has
to satisfy a condition where it touches the target
Pfire (x ′, y ′) = food pack or a mate who is actually pulling the
1 target food pack. The food pack is moved in a tug
  P (x ′, y ′, t) − P (x, y, t) of war. More agents can move the food pack more
  
1 + exp − quickly. Then, the food pack crossing the goal
ave
 T 
  line is removed from the environment and the
 
team is awarded a score.
The competition is finished when time t expires
P (x − 1, y, t) +
  or all the food packs have crossed the goal line.
P x + 1, y, t + The winner is the team who has collected more
Pave (x, y, t) = 
( )  / 4, food packs than its opponent.

P (x, y − 1, t) +
 
P(x, y + 1, t) 
 
298
Figure 3. The outline of evolutionary computation.

DECISION-MAKING
This figure has been reproduced by permission
from Kawamura, H., Yamamoto, M. & Ohuchi, A.
Each time every agent receives 10 bits of sensory
(2001).: Investigation of Evolutionary Pheromone
information and selects one action from seven
Communication Based on External Measurement
choices. A simple two-layer neural network for
and Emergence of Swarm Intelligence, Japanese
individual decision-making is determined, whose
Journal of the Society of Instrument and Control
network has a simple structure and is flexible to
Engineers, 37(5), 455–464.
represent various decision-making functions. All
the agents in each team have an identical neural
network with the same set of weight parameters.
Let ij be the j-th input bit and Ok be the probability
to select the k-th action from seven actions. The
neural network maps from the input vector to the
output vector. The weight vector of the neural
network is represented as wjk, and the mapping
is done as follows.
−1
  11  • Step 1: Two chromosomes are randomly



o k = 1 + exp −∑w jk i j 
 j=1 
/ selected from the population.
• Step 2: Ant-agent teams with the selected
−1
7   11  chromosomes compete in the ant war. The
  
∑ 1 + exp −∑w jh i j  , loser is removed from the population.
h =1    j=1 
• Step 3: The chromosomes of the winner
are copied to two prototypes and the chro-
mosomes of these two are varied by cross-
where i11 is the additional input and -1 is always over and mutation operations.
set as the bias for the neural network. • Step 4: Two new chromosomes are re-
turned to the population.
• Step 5: Go back to Step 1 until the final
EVOLUTIONARY PROCESS iteration.
The main topic of this chapter is to explain how Crossover operation exchanges the weight
multi-agent systems organize artificial pheromone values at the same locus of two chromosomes
communication with evolutionary computations. with the probability, Pc. Mutation operation adds
The chromosomes in the computations are con- noise from [-0.5, 0.5] to each weight with the
structed with a set of weights, wjk. At the initial probability, Pm.
stage of evolution, all wjk are initialized with a
random value from [-0.5, 0.5]. The evolutionary
computation has N chromosomes and these chro- ExPERIMENT
mosomes evolve through a five-step procedure
(see Figure 3). A computer experiment was carried out with 10
trials. The maximum simulation time in each
299
Figure 4. An example distribution of obtained pheromones. This figure has been reproduced by permis-
sion from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone
Communication Based on External Measurement and Emergence of Swarm Intelligence, Japanese
Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
ant-war environment was 2000. The parameters pack based on the pheromone’s level of intensity.
of the artificial pheromones, Q, γeva, and γdif were This type of pheromone attracts their mates and
set to correspond to 50, 0.1, and 0.1. There were we called it an “attractive pheromone.” There is
20 chromosomes, and the final generation of the an example distribution of attractive pheromones
evolutionary process was 50000. The parameters in Figure 4 (A).
for evolutionary operations, Pc and Pm, were set Repulsive pheromone: An ant using this type
to correspond to 0.04 and 0.08. These settings of pheromone has a tendency to dislike it. When
were determined through various preliminary exploring, this ant scatters the pheromone on the
experiments. ground. Once the ant finds a food pack, it stops
to release the pheromone. As a result of such
behavior, the pheromone field leaves the food
RESULTS pack and enters the environment. This means that
ants mark space with the pheromone that has
Two types of pheromone communications were already been explored, which is unnecessary to
self-organized in the final generation through 10 re-explore, and they therefore save time in ef-
trials. We called these two attractive and repulsive fectively finding the food pack. As this type of
pheromones. There were seven attractive phero- pheromone repulses their mates, we called it a
mones, and three repulsive pheromones. “repulsive pheromone.” There is an example
Attractive pheromone: An ant based on this distribution of repulsive pheromones in Figure 4
type of pheromone basically walks randomly and (B).
goes about exploring the environment when not Although an artificial evolutionary mechanism
sensing a special stimulus. Once sensing contact where winners survive and losers disappear gen-
with a food pack, the ant tries to carry it and releases erated these two types of ant behaviors, it is not
the pheromone on the ground. The pheromone clear whether these are sufficiently dominant in the
released by such an ant diffuses near the food evolutionary process. To evaluate the evolution of
pack, and its intensity in the environment is based ant strategies, we depicted the winning percentage
on the food pack. The other ants who have sensed of 100 competitions between successive genera-
the pheromone try to effectively reach the food tions versus the final generation. The evolutionary
300
Figure 5. The evolutionary transition of the winning percentage in the two types. The opponent of the
competition is the final generations. This figure has been reproduced by permission from Kawamura, H.,
Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary Pheromone Communication Based
on External Measurement and Emergence of Swarm Intelligence, Japanese Journal of the Society of
Instrument and Control Engineers, 37(5), 455–464.
transition of the winning percentage in the two We next measured the evolutionary process of
types is shown in Figure 5. The X-axis indicates communication by quantifying the effectiveness
the number of generations, and the Y-axis indicates of pheromone sensor inputs and the uniqueness of
the winning percentage. The percentages of both the situation where the ant released the pheromone.
types increase as each generation progresses. This Shannon’s entropy and mutual information from
graph plots the evolutionary process for dominant information theory were selected to measure the
behavior to emerge. degree of communication (Shannon & Weaver,
The tendencies of both lines differ in terms of 1964). Formally, the entropy of a discrete vari-
evolution speed, i.e., the attractive pheromone able, X, is defined as:
evolved quicker than the repulsive. The difference
in evolutionary speed may have been caused by H (X ) = −∑p (x ) log p(x),
the strength of the relationship between the out- x ∈X
come of competition and pheromone communica-

tion. The use of the attractive pheromone was
simple and the ant climbing up the pheromone where p(x) is the marginal probability-distribution
field contributed effectively to the competition. function of X.
However, the use of the repulsive pheromone was The mutual information of two discrete vari-
a little complicated because the ant was going ables X and Y is defined as:
down the pheromone field and did not always
reach the food pack. The relationship between the I (X; Y ) = H (X ) − H(X | Y)
outcome and communication was weaker than
for the attractive pheromone.
301
Figure 6.The evolutionary transition of entropy and mutual information. This figure has been reproduced
by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation of Evolutionary
H (X|Y ) = − ∑ p (x, y) log p(x | y),

x ∈ X, y∈ Y
=
P {(i , i ,…, i ); i , i ,…, i
7 8 10 7 8 10 }
∈ {0, 1} ,
where H(X|Y) represents the conditional entropies,

where Õ has seven elements, S  has 26=64 ele-
p(x,y) is the joint-probability-distribution func-
ments, and P  has 24=16 elements. The probabil-
tion of X and Y, and p(x | y) is the conditional-
distribution function of X given Y. ity of each element is denoted by p(•), and each
Entropy H(X) represents the uncertainty of p(•) is calculated by using the results of a com-
X, and mutual information I(X;Y) derives how petition. Using these probabilities, the entropy of
much knowing one of these variables reduces our output, H(Õ), mutual information on sensory
uncertainty about the other. inputs without pheromones, I (O ) , and mutual
, S
To measure the effect of sensory inputs for information on pheromone sensory inputs,
,
decision-making, three discrete variables, Õ, S I (O  ) , are measured.
, P
and P , are introduced. O is the variable of out-
 Figures 6 and 7 show the evolutionary transi-
put, S is the set of sensory inputs without the
 is the set of the pheromone tions of H(Õ), I (O ) , and I (O
, S  ) for attractive
, P
pheromone, and P
sensory inputs. The formal description of these and repulsive pheromones. The X-axis indicates
variables is: the number of generations and the Y-axis indicates
Õ={forward, backward, left, right, standby, the degrees of entropy and mutual information.
pull a food pack, drop off pheromone,} The H(Õ), is falling to around 1.3 at the 2,500th
generation for the attractive pheromone and is
almost stable after that. This means that the ant
=
S {(i , i ,…, i ); i , i ,…, i ∈ {0, 1}},
1 2 6 1 2 6
self-organized the most important reaction in the
early stage of evolution, and that the reaction was
302
Figure 7. The evolutionary transition of entropy and mutual information. This figure has been reproduced
pulling a food pack after finding it. The I (O )

, S discrete variables, I and Ã, are introduced. I is
the set of whole sensory inputs, and Ã is a discrete
and I (O  ) are gradually increasing from the
, P
variable that represents whether the ant releases
first generation to around the 25,000th, and satu- pheromones or not. The formal description of
rate after that. From the viewpoint of information these variables is:
theory, this measured transition means that the
ants were self-organizing the effectiveness of the
environmental inputs and pheromone inputs for
I = {(i , i ,…, i ); i , i ,…, i
1 2 10 1 2 10 }
∈ {0, 1} ,
decision-making. The importance of the environ-
Ã={drop off pheromone, otherwise}.
mental inputs and pheromone inputs in this evo-
The I has 210 = 1024 elements, and Ã only
lutionary path are almost same to determine action.
The H(Õ) has fallen once to around 1.4 at the has two elements. The H (I ) is the entropy of the
5,000th generation for the repulsive pheromone,  ) is the mutual
set of sensory inputs, and I (I, A
then has gradually increased and decreased, before information of Ã.
finally reaching around 1.3. The I (O ) and
, S
Figures 8 and 9 show the evolutionary transi-
I (O  ) are gradually increasing according to the
, P  ) for attractive and
tions of H (I) and I (I, A
progress of evolution. Here, the value of environ- repulsive pheromones. In both cases, H (I ) takes
mental inputs is larger than that of pheromone
a lower value in the early generations and gradu-
inputs. This means that the environmental sensors
ally increases as evolution progresses. The lower
play a more important role than the pheromone
value of H (I ) in the early generations was caused
sensors to determine action. The field of repulsive
pheromones is unstable compared with that of by inequality in action-selecting probabilities.
attractive pheromones. The randomly generated initial-neural networks
To measure the uniqueness of the pheromone- were not equal in action-selecting probabilities.
releasing action in sensory inputs, two additional Consequently, almost all ants gathered in a corner
of the environment and their sensory inputs were
303
biased in specific patterns. After evolution, the and 11. Here again, I (O,P
  ) corresponds to the
ants scattered here and there to effectively search effectiveness of pheromone sensory inputs on
for a food pack and the variations in sensory inputs
decision-making, and I (I,A
 ) corresponds to the
were wider than those in the early generations.
To investigate the evolutionary process of uniqueness of sensory-input patterns for selecting
 ) , we plotted scatter diagrams
  ) and I (I,A pheromone-release actions. The graphs indicate
I (O,P
 ) have a
  ) and I (I,A
that both values of I (O,P
with pairs observed with these values. The scatter
diagrams for attractive-pheromone evolution and distinct positive correlation and the values of the
repulsive-pheromone evolution are in Figures 10 pair are increasing together step by step. This
suggests that the situation’s uniqueness in sending
304
Figure 10. The scatter diagrams of mutual information for attractive-pheromone evolution. This figure has
been reproduced by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation
of Evolutionary Pheromone Communication Based on External Measurement and Emergence of Swarm
Intelligence, Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
Figure 11. The scatter diagrams of mutual information for repulsive-pheromone evolution. This figure has
been reproduced by permission from Kawamura, H., Yamamoto, M. & Ohuchi, A. (2001).: Investigation
of Evolutionary Pheromone Communication Based on External Measurement and Emergence of Swarm
Intelligence, Japanese Journal of the Society of Instrument and Control Engineers, 37(5), 455–464.
a signal and the reaction’s uniqueness in receiving CONCLUSION

a signal are two sides of the same coin in evolu-
tionary agent design and artificial-pheromone This chapter proposed an ant war as a competitive
communication is gradually formed while the environment enabling studies of artificial stig-
same pace is maintained. mergy and the origins of pheromone-style commu-
nication in evolutionary agent design. Two types
of collective intelligence were acquired through
305
computer simulations, i.e., ants with attractive and Dorigo, M., Maniezzo, V., & Colorni, A. (1991).
repulsive pheromones. Both types demonstrated Positive Feedback as a Search Strategy (Techni-
rational strategies to win the competition and these cal Report No. 91-016). Politecnico di Milano.
strategies effectively utilized the characteristics
Dorigo, M., & Stutzle, T. (2004). Ant Colony
of artificial pheromones and the environment.
Optimization. Cambridge, MA: The MIT Press.
We introduced Shannon’s entropy and mutual
information on artificial pheromones to measure Kawamura, H., & Yamamoto, M. Suzuki &
the situation’s uniqueness in sending pheromones Ohuchi, A. (1999). Ants War with Evolutive
and the reaction’s uniqueness in receiving phero- Pheromone Style Communication. In Advances
mones. Such uniqueness represented two sides in Artificial Life, ECAL’99 (LNAI 1674, pp.
of the same coin and artificial-pheromone com- 639-643).
munication was gradually formed while the same
Kawamura, H., Yamamoto, M., & Ohuchi, A.
pace was maintained.
(2001). (in Japanese). Investigation of Evolu-
tionary Pheromone Communication Based on
External Measurement and Emergence of Swarm
REFERENCES
Intelligence. Japanese Journal of the Society of In-
Agosta, W. (1992). Chemical Communication strument and Control Engineers, 37(5), 455–464.
– The Language of Pheromone. W. H. Freeman Mamei, M. & Zambonelli, F. (2007). Pervasive
and Company. pheromone-based interaction with RFID tags.
Ando, Y., Masutani, O., Honiden, S., Fukazawa, ACM Transactions on Autonomous and Adaptive
Y., & Iwasaki, H. (2006). Performance of Phero- Systems (TAAS) archive, 2(2).
mone Model for Predicting Traffic Congestion. Nakamura, M., & Kurumatani, K. (1997). For-
In . Proceedings of AAMAS, 2006, 73–80. mation Mechanism of Pheromone Pattern and
Bennett, F., III. (1996). Emergence of a Multi- Control of Foraging Behavior in an Ant Colony
Agent Architecture and New Tactics for the Ant Model. In Proceedings of the Fifth International
Colony Food Foraging Problem Using Genetic Workshop on the Synthesis and Simulation of
Programming. In From Animals to Animats 4, Living Systems (pp. 67 -74).
Proceedings of the Fourth International Confer- Sauter, J., Matthews, R., Parunak, H., & Brueckner,
ence on Simulations of Adaptive Behavior (pp. S. (2005). Performance of digital pheromones for
430–439). swarming vehicle control. In Proceedings of the
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). fourth international joint conference on Autono-
Swarm Intelligence from Natural to Artificial mous agents and multiagent systems (pp. 903-910).
Systems. Oxford University Press. Shannon, C., & Weaver, W. (1964). The Mathe-
Collins, R., & Jeffersion, D. (1991). AntFarm: matical Theory of Communication. The University
Towards Simulated Evolution. In Artificial Life II, of Illinois Press.
Proceedings of the Second International Confer- Sheely, T. (1995). The Wisdom of the Hive: The
ence on Artificial Life (pp. 159–168). Social Physiology of Honey Bee Colonies. Harvard
Colorni, A., Dorigo, M., & Maniezzo, V. (1991). University Press.
Distributed Optimization by Ant Colonies. In .
Proceedings, ECAL91, 134–142.
306
Sole, R., Bonabeau, E., Delgado, J., Fernan- Suzuki, K., & Ohuchi, A. (1997). Reorganization
dez, P., & Marin, J. (2000). Pattern Forma- of Agents with Pheromone Style Communica-
tion and Optimization in Army Raids. [The tion in Mulltiple Monkey Banana Problem. In .
MIT Press.]. Artificial Life, 6(3), 219–226. Proceedings of Intelligent Autonomous Systems,
doi:10.1162/106454600568843 5, 615–622.
307
308
Chapter 17
Evolutionary Search for
Cellular Automata with Self-
Organizing Properties toward
Controlling Decentralized
Pervasive Systems
Yusuke Iwase
Nagoya University, Japan
Reiji Suzuki
Takaya Arita
ABSTRACT
Cellular Automata (CAs) have been investigated extensively as abstract models of the decentralized
systems composed of autonomous entities characterized by local interactions. However, it is poorly un-
derstood how CAs can interact with their external environment, which would be useful for implementing
decentralized pervasive systems that consist of billions of components (nodes, sensors, etc.) distributed in
our everyday environments. This chapter focuses on the emergent properties of CAs induced by external
perturbations toward controlling decentralized pervasive systems. We assumed a minimum task in which
a CA has to change its global state drastically after every occurrence of a perturbation period. In the
perturbation period, each cell state is modified by using an external rule with a small probability. By
conducting evolutionary searches for rules of CAs, we obtained interesting behaviors of CAs in which
their global state cyclically transited among different stable states in either ascending or descending
order. The self-organizing behaviors are due to the clusters of cell states that dynamically grow through
occurrences of perturbation periods. These results imply that we can dynamically control the global
behaviors of decentralized systems by states of randomly selected components only.
DOI: 10.4018/978-1-60566-898-7.ch017
INTRODUCTION On the one hand, there are studies on CAs

with relaxed restrictions so as to investigate the
A cellular automaton (CA) is a discrete model behaviors of the decentralized systems in realistic
consisting of a regular grid of finite automata called situations from a variety of viewpoints. Ingerson
cells. The next state of each cell is completely & Buvel (1984) pointed out that the synchronous
decided by the current states of its neighbors updating rule is unnatural if we regard them as
including itself. CAs are well-suited for investi- abstractions of decentralized systems in a real
gating the global behavior emerging from local world, and analyzed elementary CAs with asyn-
interactions among component parts. A CA can chronous updating rules. They demonstrated that
be interpreted as an abstract model of multi-agent the global pattern of the asynchronous CA self-
systems (MAS) if we regard each cell in a CA as organized into a regular array, and argued that
an agent in a MAS, because the multiple autono- asynchronous CA might be useful for understand-
mous agents in a MAS locally interact with each ing self-organization in a real world.
other in general, as shown as Figure 1. MAS can Also, there are several discussions on the ef-
be constructed as a large-scale space by using CA fects of influences on the system from outside
and it is also easy to visualize. Therefore, there are such as boundary conditions and perturbations.
several applications for MAS which make use of Ninagawa, Yoneda & Hirose (1997) focused on
computational and emergent properties of CAs, influences of the differences, and compared the
including traffic simulations (Sakai, Nishinari & effects of the dissipative boundary conditions (in
Iida, 2006), ecological modeling (Nagata, Morita, which each cell state is randomly decided at each
Yoshimura, Nitta & Tainaka, 2008), controlling step) on the global behaviors of CAs with those of
decentralized pervasive systems (Mamei, Roli & the standard periodic boundary conditions. They
Zambonelli, 2005). showed that the CAs with the former condition
There have been a number of studies on basic can eliminate the size effects of the grid on the
characteristic of CAs under the assumption of global behaviors which are prominent in CAs
strong restrictions such as a regular arrangement with the latter condition. Marr & Hütt (2006)
of cells and synchronous update of the cell states. showed that based on comparison of space-time
It is well known that Wolfram suggested that diagrams of configurations and some of indices,
elementary (one-dimensional two-state three- the modifications of the neighboring structures on
neighbor) CAs fall into four classes. In particu- the regular grid can be functionally equivalent to
lar, he pointed out that CAs in class four exhibit the introduction of stochastic noises on the states
complex behaviors, and some of them achieve of the cells in CAs that basically exhibit chaotic
computational universality (Wolfram, 2002). behaviors.
Figure 1. A cellular automaton (CA) as an abstract model of multi-agent systems (MAS)
309
The self-organizing behaviors of CAs can be pattern of the states. However, these discussions
dynamically affected by the external influences. were based on several hand-coded rules for CAs.
Several models of CAs which have such a property Thus, it is still an open question how CAs can
have been proposed for controlling decentral- show emergent properties through the interactions
ized pervasive systems (Figure 2) (Mamei et with an external world.
al., 2005 and Kwak, Baryshnikov & Coffman, However, in general, there are complex rela-
2008). Decentralized pervasive systems consist tionships between the global behaviors of CAs
of distributed components (nodes, sensors, etc.) in and the local behaviors of cells. It is difficult to
our everyday environments and the systems per- design the rules for CAs by hand-coding which
form global tasks using their whole components. exhibit the desired emergent behaviors. Thus,
Mamei et al. (2005) focused on the influences there have been various studies based on evolu-
of the external world on decentralized pervasive tionary searches for rules of CAs that can ex-
systems, and constructed asynchronous CA with hibit emergent behaviors (Mitchell, Crutchfield
external perturbations termed ``dissipative cellular & Hraber, 1994). Rocha (2004) evolved CA rules
automata’’ (Roli & Zambonelli, 2002) in that the that can solve the density task, and discussed
external environment can somehow inject energy about the nature of the memory-like interactions
to dynamically influence the evolution of the au- among the particles of the cell states emerged for
tomata. They regarded the asynchronous CA as a storing and manipulating information. Ninagawa
group of autonomous individuals which locally in- (2005) also evolved CAs which generate 1/f noise
teract with each other, and introduced continuously where the power is inversely proportional to the
occurring perturbations on the states of the cells frequency, and obtained a CA of which behavior
into their model. The perturbations correspond is similar to the Game of Life. As above, an evo-
to the influences caused by the external world on lutionary search for rules of CAs will be also
the group. The CAs presented regular patterns if useful for understanding the characteristics of
and only if they are induced by the perturbations, systems that exhibit self-organizing properties
such as a stripe pattern. Moreover, they argued caused by interactions with an external environ-
about applications of such self-organized features ment.
for controlling decentralized pervasive systems This chapter focuses on the emergent properties
such as sensor networks. For example, based on of CAs induced by external perturbations toward
the experiments of dissipative CAs, they showed controlling decentralized pervasive systems. We
possible scenarios that the global state of the CA assumed a minimum task in which CAs have to
can be changed to the desired pattern only by change its global state after every occurrence of
filling up the limited area of the CA with a fixed perturbation period, and searched the rules for
Figure 2. CAs with external perturbations aimed at controlling decentralized pervasive systems
310
CAs which can solve the task by using a genetic ´ n(St ) (Pa )

algorithm (GA) (Iwase, Suzuki & Arita, 2007). q i,t +1j =  t i, j (1)
 q i, j n (1 - Pa ) ,
We obtained the rules for CAs in which global 
state of evolved CAs cyclically transited among
different stable states of which the number is more where δ is a local transition rule which maps a
than that of distinct cell states, and looked into the configuration of cell states in a neighborhood (3×3
self-organizing properties that a drastic change cells around the focal cell (i, j)) Sti, j into a state.
in its global state occurs every two successive Also we introduce an external perturbation ε
occurrences of perturbation periods. which changes a cell state independently of δ. ε
expresses a simple transition rule that increments
the value of the cell state as defined by
TASK
( )
e : qit,+j 1 := qit,+j 1 + 1 mod M . (2)
We constructed a task that self-organizing behav-
iors induced by external perturbations are required
to solve, and we searched the rules of CAs for
solving the task by using GA. In an evaluation ε is applied to each cell with a probability Pe every
process of a CA, there are the fixed number of after transitions of cell states by Equation 1. We
perturbation periods in which each cell state is expect that some of relationship between the global
modified by using an external rule with a small behavior of CAs and external perturbations can
probability. The CA has to change its configuration occur by introducing Equation 2. It is because that
represented by the distribution ratio of cell states the actual effect of a perturbation on the CA (a
after every occurrence of a perturbation period. change in the cell state) is deterministic although
This is non-trivial if the number of the perturbation it occurs probabilistically.
periods is larger than the number of possible cell
states because the global behavior of CAs must Transition and Evaluation
stably exhibit different configurations composed
of intermediate distribution ratios. Thus the CAs Figure 3 is a time diagram of an evaluation process
should need some kind of emergent properties that for a rule (δ) of a CA described above. Starting
utilize the occurrences of perturbations effectively. from the initial condition in which each cell state
is randomly assigned, the transitions without
Cellular Automata and Perturbations perturbations (Pe = 0.0) occur for Lpre steps so
that effects of the initial condition are eliminated.
We adopt a two-dimensional (N × N) M-state Next, a perturbation period of Ld steps occurs
nine-neighbor (Moore neighborhood) cellular every after a normal period of approximately Lint
automata with periodic boundary condition as steps. For each cell, a perturbation occurs with
an abstract model of the distributed systems a probability Pe = β during perturbation periods,
composed of autonomous entities characterized and it does not occur during normal periods (Pe =
by local interactions. 0). The evaluation stops when the normal periods
A cell (i, j) have a state qti, j ∈ {0, …, M − 1} have occurred for D + 1 times. Note that the actual
at time step t. At each time step t, each cell state is time step at which each perturbation period starts
asynchronously updated with a probability Pa by fluctuates randomly within a specific range (±Lfluct
steps) as shown in Figure 3.
311
A density distribution of cell states ρt at time + ρθi ,ϕ (M − 1) − ρθj,ϕ (M − 1) , (5)

t is a vector consisting of the ratios ρt (s) of the
cell state s among all cells, which is defined by
fθ,ϕ = ∑ Áθi ,ϕ − Áθj,ϕ . (6)
i ≠ j ∈{a (0 ),a (1),,a (D )}
1 1 if q = s
t
rt (s ) = ∑ 
i, j
 otherwise,
N ×N (i , j )∈N ×N 0
 Equation (5) defines the difference between
Át = { t
r (0), r (1), , rt (M − 1), .
t
} the scaled density distributions as the sum of the
(3) absolute differences between the corresponding
elements of the distributions. The fitness of δ is
Also, in order to stress the existence of a small the sum of the differences over all possible pairs
amount of each cell state, we define a scaled of the scaled density distributions at the last steps
density distribution ρtθ,φby of the normal periods.
Thus, the CAs have to use an occurrence of
1 a perturbation period as a trigger to change their
SFθ,ϕ (x ) = ,
1 +e −(x −θ )×ϕ own global configuration dynamically.
ρθt ,ϕ (s ) = SFθ,ϕ (ρt (s )),
Áθt ,ϕ = {ρ t
θ,ϕ }
(0), ρθt ,ϕ (1), , ρθt ,ϕ (M − 1), ,
EVOLUTIONARY SEARCh BY
(4)
GENETIC ALGORIThM
where SFθ,φ is a sigmoid function which scales ρt
Transition Rule
(s) with a threshold θ, and φis a parameter for a
degree of scaling. Equation 4 means that if ρt (s)
We optimize the rules for CAs to maximize the
is more than θ, ρtθ,φ becomes close to 1. Otherwise,
fitness defined above by using GA. We adopt a
it becomes close to 0.
transition rule based on the number of each cell
Here, we take ρtθ,φ as the global configuration
state in the neighborhood, expecting emergence
of the CA, and define the fitness by using ρtθ,φ at
of interesting behaviors of the CA and reduction
the last steps of normal periods (ρa(0)θ,φ, ρa(1)θ,φ, …,
of the search domain for GA.
ρa(D)θ,φ in Figure 3) as follows:
Figure 4 illustrates an example of the transition
rules in the case of M = 3. This rule is an extended
Áθi ,ϕ − Áθj,ϕ = ρθi ,ϕ (0) − ρθj,ϕ (0) + version of outer-totalistic rules. The pattern of the
ρθi ,ϕ (1) − ρθj,ϕ (1) +  neighborhood configuration Sti, j at the cell (i, j)
is given by
Figure 3. Repeated occurrences of external perturbations
312
1 if qit+k , j +l = s The above description of the rule largely re-

t
n (s ) =
i, j ∑  otherwise, duces the search domain for GA. In case of em-
|k |≤1,|l |≤1,|k |+|l |≠0 0
 ploying two-dimensional M-state nine-neighbor
Sit, j = { }
qit, j , nit, j (0), nit, j (1), , nit, j (M − 1) . CAs, the maximum number of the transition rules
(7) is MM9 because there are M9 distinct neighborhood
configurations at a maximum. On the contrary,
if we adopt rules described above, the possible
nti, j (s) is the number of the cell state s in the number comes down to M8 + (M− 1)CM –1 1.
neighborhood except for the cell (i, j) itself. Sti, j
is a set of the focal cell state and the number of Genetic Information and
each cell state as illustrated in Figure 4 - A. Evolutionary Operations
Since the changes of the CAs’ global state are
evaluated with the densities of cell states only, We used a form of GA to evolve CAs to perform
there would be existing rules which can be applied the task described above. Each individual in GA
to some situations if the cell states on the rules has a string consists of genes gl. Each gl represents
are uniformly changed by something method. the cell state into which the corresponding pat-
Therefore, in concert with the external perturba- tern of the neighborhood is mapped in δ0 (Figure
tions, we further introduced a transitivity into the 4 - C). The population of I individuals is evolved
transition rule δ defined as follows: over G generations as follows:
dq(n(0), n(1), , n(M − 1)) = 1. All gl s in the initial population is initialized

d(q, n(0), n(1), , n(M − 1)), with 0.
dq(n(0), n(1), , n(M − 1)) = (8) 2. The string of genes of each individual is
d 0(n(cs(q, 0)), n(cs(q, 1)), , translated to the transition rule, and then its
  mod M fitness is calculated by using the evaluation
n(cs(q, M − 1))) + q 
 process described above.
(cs(q, x ) = (q + x ) mod M ), 3. The top E individuals are copied without
modification to the next generation. The
remaining I - E individuals for the next
generation are selected with the probability
where δq is a map from a neighboring configura-
equal to its relative fitness within the whole
tion in which the focal cell state is q to a state.
population, and they pair with each other
Equation 8 means that δq (q = 1, …, M − 1) can
as parents.
be obtained by incrementing each value in the
equation of δ0 as shown in Figure 4 - B.
Figure 4. An example of applying the transition rules (M = 3)
313
4. Two offsprings are generated from the fitness, and the worst fitness was almost zero
pair based on a two-point crossover with a through the experiment.
probability Pcrossover and a mutation for each As defined in Equation (6), the fitness is the
gene with a probability Pmutation. A mutation sum of the differences between the scaled den-
changes gl to a random value (0 ≤ gl < M) sity distributions. So as to grasp the main reason
except for the current value. for the increase in fitness through the experiment,
5. The E elites and I – E offsprings form the we plotted the all differences measured during
population of the new generation, and the the evaluation processes of all individuals at each
process goes back to the step 2 until the generation in Figure 6. After approximately the
generation reaches G. 30th generation, we see that the difference often
became the maximum value 3.0 which was ob-
ExPERIMENTAL RESULTS tained when the difference between the mutually
AND ANALYSES opposite distributions were measured. It clearly
shows that the evolved CAs successfully exhib-
Course of Evolution ited the self-organizing behaviors that their
global configurations drastically changed after
We adopted the settings of parameters as follows: the occurrences of external perturbations.
N = 64, M = 3, Pa = 0.2, β = 0.01, D=5, Lpre =
2048, Lint = 1024, Lfluct = 128, Ld = 8, θ = 0.1, φ = Emergence of State Transition
100, I = 32, E = 8, G = 256, Pcrossover = 0.75 and Cycles Induced by External
Pmutation = 0.05. Perturbations
We conducted 12 runs, and it turned out that
the fitness reached approximately 27 in 10 runs. In the previous section, we observed that the
The fitness of remaining 2 runs went up to about population successfully evolved, and individuals
21 or 25. Here, we focus on the run in which the were expected to exhibit self-organizing behaviors
fitness reached the best value among them. induced by perturbations. Next, we analyze the
Figure 5 shows the best, average and worst behavior of the adaptive individuals in detail, and
fitness at each generation. The best fitness was discuss their self-organizing properties.
about zero in the initial population and it rapidly Figure 7 illustrates the transitions of several
went up to approximately 19 until the 10th genera- indices during an evaluation of a typical individual
tion, then eventually converged to approximately
27 around the 50th generation. Also, we see the
average fitness tended to be a half of the best Figure 6. The difference between the scaled den-
sity distributions. The dots represent all values of
Equation (5) measured when all individuals were
Figure 5. The fitness of the population evaluated in each generation.
314
in the last generation of the same run as in Figure 5. tions among cells showed that if the number of the
The string of genes is ``020110010011110120010 subsequent dominant cell state exceeds a certain
111010121012010000000000’’2. The lower graph threshold by perturbations, it begins to increase
shows the transitions of the elements of density during the subsequent normal period.
distribution and the entropy3 during the evaluation. Figure 8 is the trajectory of the density dis-
The above images also illustrate the configuration tribution during this evaluation. We see that the
of cell states at the end of each period. density distribution showed a triangle cycle on
Through the preparation period, the configura- this space as a result of the emergent dynamics
tion of the CA gradually changed from the random explained above. This cycle is ascending in that
initial configuration to the stable one which was the value of the dominant cell state increases as
characterized by the decrease in the entropy. At the global configuration changes.
the end of the preparation period, there were small The global configuration of the CA in Figure
clusters of the state 1 in a sea of the state 0 (Fig- 7 and Figure 8 showed cyclic transitions between
ure 7 - 1). The configuration did not change through 6 different configurations occupied by almost one
the first normal period (2). cell state (see Figure 7 - 4) or two cell states (i.
Because the most of the cell states were 0, e. a number of state 2 in a sea of state 1, (6)).
the occurrences of the first perturbation period Because the scaling function increases the small
increased the density of the state 1 (3). Then, the density of the cell state, the actual differences in
clusters of 1 gradually expanded their size and these scaled density distributions become large,
finally occupied the whole configuration through and as a result, the differences between the sev-
the subsequent normal period (4). eral pairs (i. e. (6) and (12)) become the highest.
In contrast, the effect of the second perturbation Also, in each normal period, the global configu-
period was not strong. Although the density of the ration completely converges before the end of the
state 2 was increased (5), the global configuration period as shown in Figure 7. Thus, we can say
did not change any further (6). However the effect that the cyclic behavior emerged through the
of the third perturbation period (7) caused the course of evolution because of these adaptive and
significant change in the global configuration (8). stable properties.
The clusters of the state 2 appeared, expanded their On the other hand, we also observed another
size, and finally occupied the whole configuration. interesting rule at the last generation in other
As explained, we observed similar changes in the runs. The string of genes is ``112211000100211
dominant cell state (12) every two occurrences of 001212112222120222200000000000’’. Its typi-
perturbation periods (9 - 11). The detailed analyses cal behavior is illustrated in Figure 9 and Figure
on the effects of perturbations on the local interac- 10. As we can see from this figure, the density
Figure 7. The behavior of cellular automata (ascending cycle)
315
Figure 8. The trajectory of density distribution (ascending cycle)
distribution exhibited a reverse (descending) Stability of Emerged Cycles

cycle in comparison with the previous one. This
is an unexpected phenomenon because the global We obtained the emergent behaviors of CAs with
configuration changes in a descending order de- cyclic trajectories of the density distribution trig-
spite that perturbations increment the values of gered by perturbations. However, it needs more
cell states. The detailed analyses also showed consideration to discuss the stability of the cyclic
that the perturbations worked like a catalyst in behaviors of CAs during their long-term transitions
that the perturbed cells can decrement the values because the number of the perturbation periods
of cell states in neighbors. For example, if the in each evaluation process was merely 5. Conse-
configuration consists of a sea of the state 0, the quently, we conducted additional evaluations on
perturbed cells of 1 can change their neighboring the two typical rules explained in the previous
cell states of 0 to 2. section, in which the settings of the parameters
Among the successful runs, the rules of CA were the same as the previous experiments except
that exhibited the ascending cycles were clearly for the number of perturbation periods D = 17.
observed in 5 runs, and those exhibited the de- In order to understand the stability of transi-
scending cycles were obtained in 3 runs. We can tions between configurations, we calculated the
say that there were two opposite solutions to solve transition probability between configurations at
the task, but both rules cyclically changed the the steps when the global configurations were
global configuration of CAs by using their self- used for fitness evaluation (a(0), a(1), …, a(D) in
organizing properties that a drastic change in its Figure 3). Specifically, all global configurations
global state occurs every if the number of the were divided into 23 = 8 classes depending on
subsequent dominant cells goes beyond a certain whether each cell state exists or not on the global
threshold. configuration. The existence of a cell state on the
Figure 9. The behavior of cellular automata (descending cycle)
316
Figure 10. The trajectory of density distribution (descending cycle)
global configuration was decided by whether the set of existent cell states in the corresponding
density of the cell state can be increased by the class, and each value is the specific transition
scaling function Equation 4 or not4. For example, probability from the column to the row class. The
if a density distribution is {0.80, 0.15, 0.05} and transition diagram on the right also visualizes the
the scaled density distribution is {1.00, 0.99, same distribution probabilities, in which the line
0.01}, then it is regarded that the cell state 0 and types of arrows correspond to different ranges of
1 exist on the configuration. Then, we measured the value. As shown from the table and diagram,
the transition probabilities between the classes all the transition probabilities corresponding to
during the evaluation process. the ascending cycle {0} → {0, 1} → {1} → {1,
The table in Figure 11 displays the average tran- 2} → {2} → {2, 0} → …, were greater than 0.65.
sition probabilities between configurations over As above, the ascending cycle with 6 different
100 evaluations of the individual which showed configurations is stable through the long-term
the ascending cycle in the previous experiment. evaluation.
Each set of cell states in row and column is the
Figure 11. The transition table and diagram for global configuration (ascending cycle)
Figure 12. The transition table and diagram for global configuration (descending cycle)
317
The table in Figure 12 displays the transition configuration, expecting emergences of the self-
probability between configurations of the indi- organizing behaviors and the reduction of the
vidual which showed the descending cycle in the search space for a GA. We assumed a minimal
previous experiments. The transition probabilities task in which a CA has to change its global state
corresponding to the descending cycle {0} → {2, every perturbation, and then searched the rules for
0} → {2} → {1, 2} → {1} → {0, 1} → …, were CAs which can solve the task by using a GA. We
greater than 0.65 approximately, which are simi- obtained the rules for the CA in which the global
lar to the transition probabilities of the ascending configuration cyclically transited among different
cycle. The transition diagram clearly shows that stable configurations, and these stable configura-
the cycle is also stable while it is reversed com- tions composed of not only homogeneous but also
pared to the previous one. heterogeneous cell states. As a result, the number
of stable configurations became twice as that of
possible cell states. These interesting results were
CONCLUSION obtained only when we introduced the transitivity
(Equation (8)) into the rule of CAs. It should be
We have investigated emergent properties of CAs emphasized that we found both ascending and
induced by external perturbations. We introduced descending cycles of global configurations even
the transitivity into an extended version of outer- though a perturbation always increments the value
totalistic rules of CAs, and adopted the scaled of a cell state. Detailed analyses showed that the
density distribution of cell states as the global ascending cycle was due to the self-organizing
Figure 13. The emergent behaviors of the CAs in which the cells performed random walk on a two dimen-
sional space. The neighborhood of a cell is defined by all those cells that are within a specific distance.
Each cell can be regarded as a unit of decentralized mobile robots. We adopted the same rule as that of
the CA which exhibited an ascending cycle in Figure 7. The center graph shows the transitions of the
elements of density distribution during the evaluation. Each image also illustrates the configuration
of cell states at the end of each period. Each circle denotes a cell, and its color is assigned to its state.
The occurrence of the second perturbation period increased the density of the state 0 (2, 3). Then, the
clusters of 0 gradually expanded their size and finally occupied the whole configuration through the
subsequent normal period (4).
318
feature that a drastic change in its global state oc- REFERENCES

curs every when the accumulation of the perturbed
cells goes beyond a certain threshold. Also, the Ingerson, T. E., & Buvel, R. L. (1984). Struc-
descending cycle was due to the catalytic effect ture in Asynchronous Cellular Automata.
of perturbed cells which change cell states in their Physica D. Nonlinear Phenomena, 10, 59–68.
neighborhood into the dominant cell state in the doi:10.1016/0167-2789(84)90249-5
subsequent stable configuration. Iwase, Y., Suzuki, R., & Arita, T. (2007). Evolu-
Mamei et al. (2005) argued about applications tionary Search for Cellular Automata that Exhibit
of self-organizing features in the dissipative CAs Self-Organizing Properties Induced by External
for controlling decentralized pervasive systems. In Perturbations. In Proc. 2007 IEEE Congress
our model, we can regard the external perturbations on Evolutionary Computation (CEC2007) (pp.
as the signals for controlling the global behavior of 759-765).
decentralized systems. Our results imply that we
can dynamically control the global behaviors of Kwak, K. J., Baryshnikov, Y. M., & Coffman, E.
decentralized systems by changing several states G. (2008). Self-Organizing Sleep-Wake Sensor
of randomly selected components only. Systems. In Proc. the 2nd IEEE International
In real systems such as swarm robots or de- Conference on Self-Adaptive and Self-Organizing
centralized pervasive systems, the nodes are not Systems (SASO2008) (pp. 393-402).
regularly distributed in general. Thus, one might
Mamei, M., Roli, A., & Zambonelli, F. (2005).
assume that the dynamics of CAs could not be
Emergence and Control of Macro-Spatial
applied to these systems. However, Mamei et al.
Structures in Perturbed Cellular Automata, and
(2005) conducted additional experiments with
Implications for Pervasive Computing Systems.
CAs in which the cells were randomly distributed
IEEE Trans. Systems, Man, and Cybernetics .
and their neighborhood of a cell is defined by all
Part A: Systems and Humans, 35(3), 337–348.
those cells that are within a specific distance, and
doi:10.1109/TSMCA.2005.846379
showed that their schemes could be also adapted to
these CAs. As shown in Figure 13, we confirmed Marr, C., & Hütt, M. T. (2006). Similar Impact
that the self-organizing behaviors described above of Topological and Dynamic Noise on Complex
also appeared on a CA in which the cells performed Patterns. Physics Letters. [Part A], 349, 302–305.
random walk on a two dimensional space. doi:10.1016/j.physleta.2005.08.096
Future work includes investigating our model
Mitchell, M., Crutchfield, J. P., & Hraber, P. T.
with different several parameters, obtaining more
(1994). Evolving Cellular Automata to Perform
complex behaviors induced by different kinds of
Computations: Mechanisms and Impediments.
perturbations, such as conditional branches of the
Physica D. Nonlinear Phenomena, 75, 361–391.
behavior trajectories based on the kinds of per-
doi:10.1016/0167-2789(94)90293-3
turbations, and designing multi-agent systems
which make use of the self-organizing properties Nagata, H., Morita, S., Yoshimura, J., Nitta, T., &
of the CAs. Tainaka, K. (2008). Perturbation Experiments and
Fluctuation Enhancement in Finite Size of Lattice
Ecosystems: Uncertainty in Top-Predator Conser-
ACKNOWLEDGMENT vation. Ecological Informatics, 3(2), 191–201.
doi:10.1016/j.ecoinf.2008.01.005
This work was supported in part by a Grant-in-Aid
for 21st Century COE ``Frontiers of Computa-
tional Science’’.
319
Ninagawa, S. (2005). Evolving Cellular Automata ENDNOTES

by 1/f Noise. In Proc. the 8th European Confer-
ence on Artificial Life (ECAL2005) (pp. 453-460).
1
In the case of M = 3, the every possible
number of the transition rule in our model
Ninagawa, S., Yoneda, M., & Hirose, S. (1997). comes down from MM9 = 339 ≈ 1.505 × 109391
Cellular Automata in Dissipative Boundary Condi- to M8 + (M− 1)CM −1 = 310C2 ≈ 2.954 × 1021.
tions [in Japanese]. Transactions of Information 2
Each value in the string represents the value
Processing Society of Japan, 38(4), 927–930. in δ0 as follows:
Rocha, L. M. (2004). Evolving Memory: Logi- “δ0(0,0,8) δ0(0,1,7) … δ0(1,0,7) δ0(1,1,6) …
cal Tasks for Cellular Automata. In Proc. the … δ0(7,0,1) δ0(7,1,0) δ0(8,0,0)”.
9th International Conference on the Simulation
3
The entropy of the global configuration H
and Synthesis of Living Systems (ALIFE9) (pp. is defined by
256-261). 1
H = ∑ H ,
N × N (i, j )∈N ×N i, j
Roli, A., & Zambonelli, F. (2002). Emergence of M −1
Macro Spatial Structures in Dissipative Cellular H i, j = −∑ Pi, j (s ) log2 Pi, j (s ),
Automata. In Proc. the 5th International Con- s =0
ference on Cellular Automata for Research and where Hi, j is the entropy of the cell (i, j), and
Industry (ACRI2002) (pp. 144-155). Pi,j (s) is the probability of the occurrence of
the cell state s at (i, j) during each period.
Sakai, S., Nishinari, K., & Iida, S. (2006). A 4
Actually, we defined that the density can
New Stochastic Cellular Automaton Model on be increased if the density is larger than the
Traffic Flow and Its Jamming Phase Transition. x-value (approximately 0.075) at the inter-
Journal of Physics. A, Mathematical and Gen- section of y = SF0.1, 100 (x) with y = x around
eral, 39(50),15327–15339. doi:10.1088/0305- θ = 0.1.
4470/39/50/002
Wolfram, S. (2002). A New Kind of Science.
Wolfram Media Inc.
320
321
Compilation of References
Abbeel, P., & Ng, A. Y. (2005). Exploration and appren- Ando, Y., Masutani, O., Honiden, S., Fukazawa, Y., &
ticeship learning in reinforcement learning. In Proceedings Iwasaki, H. (2006). Performance of Pheromone Model
of the Twentyfirst International Conference on Machine for Predicting Traffic Congestion. In . Proceedings of
Learning (pp. 1-8). AAMAS, 2006, 73–80.
Abbott, A., Doering, C., Caves, C., Lidar, D., Brandt, Angeline, P. J., Sauders, G. M., & Pollack, J. B. (1994).
H., & Hamilton, A. (2003). Dreams versus Reality: Ple- An evolutionary algorithms that constructs recurrent
nary Debate Session on Quantum Computing. Quantum neural networks. IEEE Transactions on Neural Networks,
Information Processing, 2(6), 449–472. doi:10.1023/ 5, 54–65. doi:10.1109/72.265960
B:QINP.0000042203.24782.9a
Appleton-Young, L. (2008). 2008 real estate market
Aberdeen, D., & Baxter, J. (2002). Scalable Internal-State forecast. California Association of Realtors. Retrieved
Policy-Gradient Methods for POMDPs. In Proceedings December 2008, from http://bayareahousingreview.com/
of the Nineteenth International Conference on Machine wp-content/uploads/2008/02/ leslie_appleton_young
Learning (pp. 3-10). _preso _read-only1.pdf.
Acerbi, A., et al. (2007). Social Facilitation on the De- Arai, S. & Tanaka, N. (2006). Experimental Analysis
velopment of Foraging Behaviors in a Population of of Reward Design for Continuing Task in Multiagent
Autonomous Robots. In Proceedings of the 9th European Domains. Journal of Japanese Society for Artificial
Conference in Artificial Life (pp. 625-634). Intelligence, in Japanese, 13(5), 537-546.
Agogino, A. K., & Tumer, K. (2004). Unifying Temporal Aranha, C., & Iba, H. (2007). Portfolio Management by
and Structural Credit Assignment Problems. In Proceed- Genetic Algorithms with Error Modeling. In JCIS Online
ings of the Third International Joint Conference on Au- Proceedings of International Conference on Computa-
tonomous Agents and Multi-Agent Systems (pp. 980-987). tional Intelligence in Economics & Finance.
Agosta, W. (1992). Chemical Communication – The Arthur, W. B. (1993). On designing economic agents
Language of Pheromone. W. H. Freeman and Company. that behave like human agents. Journal of Evolutionary
Economics, 3, 1–22. doi:10.1007/BF01199986
Alfarano, S., Wagner, F., & Lux,T. (2004). Estimation of
Agent-Based Models: the case of an asymmetric herding Arthur, W. B., Holland, J. H., LeBaron, B., Palmer, R.
model. G., & Taylor, P. (1997). Asset Pricing under Endogenous
Expectations in an Artificial Stock Market. [Addison-
Ambler, S. (2008). Scaling Scrum – Meeting Real World
Wesley.]. The Economy as an Evolving Complex System,
Development Needs. Dr. Dobbs Journal. Retrieved
II, 15–44.
April 23, 2008 from http://www.drdobbsonline.net/
architect/207100381.
Atanassov, K. T. (1999). Intuitionistic Fuzzy Sets, Physica Bazerman, M. (1998). Judgment in Managerial Decision
Verlag. Heidelberg: Springer. Making. John Wiley & Sons.
Axelrod, R. (1997). The Complexity of Cooperation Becker, M., & Szczerbicka, H. (2005). Parameters In-
-Agent-Based Model of Competition and Collaboration. fluencing the Performance of Ant Algorithm Applied to
Princeton University Press. Optimisation of Buffer Size in Manufacturing. Industrial
Engineering and Management Systems, 4(2), 184–191.
Axtell, R. (2000). Why Agents? On the Varied Motiva-
tion For Agent Computing In the Social Sciences. The Beer, R. D. (1996). Toward the Evolution of Dynami-
Brookings Institution Center on Social and Economic cal Neural Networks for Minimally Cognitive. In From
Dynamics Working Paper, November, No.17. Animals to Animats 4: Proceedings of the Fourth Inter-
national Conference on Simulation of Adaptive Behavior
Bäck, T. (1996). Evolutionary Algorithms in Theory and
(pp. 421-429).
Practice: Evolution Strategies, Evolutionary Program-
ming, Genetic Algorithms. Oxford University Press. Benenti, G. (2004). Principles of Quantum Computation
and Information (Vol. 1). New Jersey: World Scientific.
Bagnall, A. J., & Smith, G. D. (2005). A Multi agent
Model of UK Market in Electricity Generation. IEEE Beni, G., & Wang, J. (1989). Swarm Intelligence in
Transactions on Evolutionary Computation, 522–536. Cellular Robotic Systems. In Proceed. NATO Advanced
doi:10.1109/TEVC.2005.850264 Workshop on Robots and Biological Systems, Tuscany,
Italy, June 26–30
Baird, L., & Poole, D. (1999). Gradient Descent for
General Reinforcement Learning. Advances in Neural Benjamin, D., Brown, S., & Shapiro, J. (2006). Who is ‘be-
Information Processing Systems, 11, 968–974. havioral’? Cognitive ability and anomalous preferences.
Levine’s Working Paper Archive 122247000000001334,
Baki, B., Bouzid, M., Ligęza, A., & Mouaddib, A. (2006).
UCLA Department of Economics.
A centralized planning technique with temporal constraints
and uncertainty for multi-agent systems. Journal of Ex- Bennett, F., III. (1996). Emergence of a Multi-Agent
perimental & Theoretical Artificial Intelligence, 18(3), Architecture and New Tactics for the Ant Colony Food
331–364. doi:10.1080/09528130600906340 Foraging Problem Using Genetic Programming. In From
Animals to Animats 4, Proceedings of the Fourth Interna-
Baldassarre, G., Nolfi, S., & Parisi, D. (2003).
tional Conference on Simulations of Adaptive Behavior
Evolving Mobile Robots Able to Display Collec-
(pp. 430–439).
tive Behaviours . Artificial Life, 9(3), 255–267.
doi:10.1162/106454603322392460 Binder, W. J., Hulaas, G., & Villazon, A. (2001). Portable
Resource Control in the J-SEAL2 Mobile Agent System. In
Baldassarre, G. (2007, June). Research on brain and be-
Proceedings of International Conference on Autonomous
haviour, and agent-based modelling, will deeply impact
Agents (pp. 222-223).
investigations on well-being (and theoretical economics).
Paper presented at International Conference on Policies Black, F., & Litterman, R. (1992, Sept/Oct). Global Port-
for Happiness, Certosa di Pontignano, Siena, Italy. folio Optimization. Financial Analysts Journal, 28–43.
doi:10.2469/faj.v48.n5.28
Barto, A. (1996). Muti-agent reinforcement learning and
adaptive neural networks. Retrieved December 2008, from Blynel, J., & Floreano, D. (2003). Exploring the T-Maze:
http://stinet.dtic.mil/cgi-bin/GetTRDoc?AD=ADA3152 Evolving Learning-Like Robot Behaviors using CTRNNs.
66&Location=U2&doc =GetTRDoc.pdf. In Proceedings of the 2nd European Workshop on Evo-
lutionary Robotics (EvoRob’2003) (LNCS).
322
Boehm, B., & Turner, R. (2004). Balancing Agility and C.A.R. (2008). U.S. economic outlook: 2008. Retrieved
discipline: A Guide for the Perplexed. Addison-Wesley December 2008, from http://rodomino.realtor.org/
Press. Research.nsf/files/ currentforecast.pdf/$FILE/current-
forecast.pdf.
Bonabeau, E., Dorigo, M., & Theraulaz, G. (1999). Swarm
Intelligence from Natural to Artificial Systems. Oxford Callebaut, W., & Rasskin-Gutman, D. (Eds.). (2005).
University Press. Modularity: Understanding the development and evolution
of natural complex systems. MA: MIT Press.
Bornholdt, S. (2001). Expectation bubbles in a spin model
of markets. International Journal of Modern Physics C, Caplin, A., & Dean, M. (2008). Economic insights from
12(5), 667–674. doi:10.1142/S0129183101001845 ``neuroeconomic’’ data. The American Economic Review,
98(2), 169–174. doi:10.1257/aer.98.2.169
Bossaerts, P., Beierholm, U., Anen, C., Tzieropoulos, H.,
Quartz, S., de Peralta, R., & Gonzalez, S. (2008, Septem- Carnap, R., & Jeffrey, R. (1971). Studies in Inductive
ber). Neurobiological foundations for “dual system”’ Logics and Probability (Vol. 1, pp. 35–165). Berkeley,
theory in decision making under uncertainty: fMRI and CA: University of California Press.
EEG evidence. Paper presented at Annual Conference on
Casari, M. (2004). Can genetic algorithms explain ex-
Neuroeconomics, Park City, Utah.
perimental anomalies? An application to common prop-
Bostonbubble.com. (2007). S&P/Case-Shiller Boston erty resources. Computational Economics, 24, 257–275.
snapshot Q3 2007. Retrieved December 2008, from http:// doi:10.1007/s10614-004-4197-5
www.bostonbubble.com/forums/viewtopic.php?t=598.
Case, K., Glaeser, E., & Parker, J. (2000). Real estate
Boswijk H. P., Hommes C. H, & Manzan, S. (2004). and the macroeconomy. Brookings Papers on Economic
Behavioral Heterogeneity in Stock Prices. Activity, 2, 119–162. doi:.doi:10.1353/eca.2000.0011
Boutilier, C., & Poole, D. (1996). Computing Optimal Case, K., & Shiller, R. (1989). The efficiency of the
Policies for Partially Observable Decision Processes market for single-family homes. The American Economic
using Compact Representations. In Proceedings of the Review, 79, 125–137.
Thirteenth National Conference on Artificial Intelligence
Case, K., & Shiller, R. (1990). Forecasting prices and
(pp. 1168-1175).
excess returns in the housing market. American Real
Brocas, I., & Carrillo, J. (2008a). The brain as a hierarchi- Estate and Urban Economics Association Journal, 18,
cal organization. The American Economic Review, 98(4), 263–273. doi:.doi:10.1111/1540-6229.00521
1312–1346. doi:10.1257/aer.98.4.1312
Case, K., & Shiller, R. (2003). Is there a bubble in the
Brocas, I., & Carrillo, J. (2008b). Theories of the mind. housing market? Brookings Papers on Economic Activity,
American Economic Review: Papers\& Proceedings, 1, 299–342. doi:.doi:10.1353/eca.2004.0004
98(2), 175-180.
Chalkiadakis, G., & Boutilier, C. (2008). Sequential Deci-
Brunnermeier, M. K. (2001). Asset Pricing under sion Making in Repeated Coalition Formation under Un-
Asymmetric Information. Oxford University Press. certainty, In: Proc. of 7th Int. Conf. on Autonomous Agents
doi:10.1093/0198296983.001.0001 and Multi-agent Systems (AA-MAS 2008), Padgham,
Parkes, Müller and Parsons (eds.), May, 12-16, 2008,
Budhraja, V. S. (2001). California’s electricity crisis. IEEE
Estoril, Portugal, http://eprints.ecs.soton.ac.uk/15174/1/
Power Engineering Society Summer Meeting.
BayesRLCF08.pdf
323
Chan, N. T., LeBaron, B., Lo, A. W., & Poggio, T. (2008). Chen, S.-H., & Tai, C.-C. (2003). Trading restrictions,
Agent-based models of financial markets: A comparison price dynamics and allocative efficiency in double auction
with experimental markets. MIT Artificial Markets Proj- markets: an analysis based on agent-based modeling and
ect, Paper No. 124, September. Retrieved January 1, 2008, simulations. Advances in Complex Systems, 6(3), 283–302.
from http://citeseer.ist.psu.edu/chan99agentbased.html. doi:10.1142/S021952590300089X
Chang, T. J., Meade, N., Beasley, J. E., & Sharaiha, Y. M. Chen, S.-H., Zeng, R.-J., & Yu, T. (2009a). Analysis of
(2000). Heuristics for Cardinality Constrained Portfolio Micro-Behavior and Bounded Rationality in Double
Optimization . Computers & Operations Research, 27, Auction Markets Using Co-evolutionary GP . In Pro-
1271–1302. doi:10.1016/S0305-0548(99)00074-X ceedings of World Summit on Genetic and Evolutionary
Computation. ACM.
Charness, G., & Levin, D. (2005). When optimal choices
feel wrong: A laboratory study of Bayesian updating, Chen, S., & Yeh, C. (1996). Genetic programming learning
complexity, and affect. The American Economic Review, and the cobweb model . In Angeline, P. (Ed.), Advances in
95(4), 1300–1309. doi:10.1257/0002828054825583 Genetic Programming (Vol. 2, pp. 443–466). Cambridge,
MA: MIT Press.
Chattoe, E. (1998). Just how (un)realistic are evolutionary
algorithms as representations of social processes? Journal Chen, S.-H., Zeng, R.-J., & Yu, T. (2009). Co-evolving
of Artificial Societies and Social Simulation, 1. trading strategies to analyze bounded rationality in double
auction markets . In Riolo, R., Soule, T., & Worzel, B.
Chen, S. H., & Yeh, C. H. (2002). On the Emergent
(Eds.), Genetic Programming: Theory and Practice VI (pp.
Properties of Artificial Stock Markets: The Efficient
195–213). Springer. doi:10.1007/978-0-387-87623-8_13
Market Hypothesis and the Rational Expectations Hypoth-
esis. Journal of Behavior &Organization, 49, 217–239. Chen, S.-H., Zeng, R.-J., & Yu, T. (2008). Co-evolving
doi:10.1016/S0167-2681(02)00068-9 trading strategies to analyze bounded rationality in double
auction markets . In Riolo, R., Soule, T., & Worzel, B.
Chen, S.-H. (2008). Software-agent designs in econom-
(Eds.), Genetic Programming Theory and Practice VI
ics: An interdisciplinary framework. IEEE Computa-
(pp. 195–213). Springer.
tional Intelligence Magazine, 3(4), 18–22. doi:10.1109/
MCI.2008.929844 Chen, S.-H., & Chie, B.-T. (2007). Modularity, product
innovation, and consumer satisfaction: An agent-based
Chen, S.-H., & Chie, B.-T. (2004). Agent-based economic
approach . In Yin, H., Tino, P., Corchado, E., Byrne, W.,
modeling of the evolution of technology: The relevance
& Yao, X. (Eds.), Intelligent Data Engineering and Auto-
of functional modularity and genetic programming.
mated Learning (pp. 1053–1062). Heidelberg, Germany:
International Journal of Modern Physics B, 18(17-19),
Springer. doi:10.1007/978-3-540-77226-2_105
2376–2386. doi:10.1142/S0217979204025403
Chen, L., Xu, X., & Chen, Y. (2004). An adaptive ant
Chen, S.-H., & Huang, Y.-C. (2008). Risk preference,
colony clustering algorithm. In Proceedings of the Third
forecasting accuracy and survival dynamics: Simulations
IEEE International Conference on Machine Learning and
based on a multi-asset agent-based artificial stock market.
Cybernetics (pp. 1387-1392).
Journal of Economic Behavior & Organization, 67(3),
702–717. doi:10.1016/j.jebo.2006.11.006 Chen, S., & Yeh, C. (1995). Predicting stock returns
with genetic programming: Do the short-run nonlinear
regularities exist? In D. Fisher (Ed.), Proceedings of the
Fifth International Workshop on Artificial Intelligence
and Statistics (pp. 95-101). Ft. Lauderdale, FL.
324
Chen, S.-H., Chie, B.-T., & Tai, C.-C. (2001). Evolv- Colorni, A., Dorigo, M., & Maniezzo, V. (1991). Dis-
ing bargaining strategies with genetic programming: tributed Optimization by Ant Colonies. In . Proceedings,
An overview of AIE-DA Ver. 2, Part 2. In B. Verma & ECAL91, 134–142.
A. Ohuchi (Eds.), Proceedings of Fourth International
Colyvan, M. (2004). The Philosophical Significance of
Conference on Computational Intelligence and Multi-
Cox’s Theorem. International Journal of Approximate
media Applications (ICCIMA 2001) (pp. 55–60). IEEE
Reasoning, 37(1), 71–85. doi:10.1016/j.ijar.2003.11.001
Computer Society Press.
Colyvan, M. (2008). Is Probability the Only Coherent
Chen, X., & Tokinaga, S. (2006). Analysis of price
Approach to Uncertainty? Risk Analysis, 28, 645–652.
fluctuation in double auction markets consisting of
doi:10.1111/j.1539-6924.2008.01058.x
multi-agents using the genetic programming for learning.
Retrieved from https://qir.kyushuu.ac.jp/dspace/bitstream Commonweal of Australia. (2001). Economic Outlook.
/2324/8706/ 1/ p147-167.pdf. Retrieved December 2008, from http://www.budget.gov.
au/2000-01/papers/ bp1/html/bs2.htm.
China Bystanders. (2008). Bank profits trimmed by
subprime losses. Retrieved from http://chinabystander. Cont, R., & Bouchaud, J.-P. (2000). Herd behavior and
wordpress.com /2008/03/25/bank-profits-trimmed-by- aggregate fluctuations in financial markets. Macro-
subprime-losses/. economics Dynamics, 4, 170–196.
Chrisman, L. (1992). Reinforcement Learning with Per- Covaci, S. (1999). Autonomous Agent Technology. In
ceptual Aliasing: The Perceptual Distinctions Approach. Proceedings of the 4th international symposium on Au-
In Proceedings of the Tenth National Conference on tonomous Decentralized Systems. Washington, DC: IEEE
Artificial Intelligence (pp. 183-188). Computer Science Society.
Cincotti, S., Focardi, S., Marchesi, M., & Raberto, M. Croley, T., & Lewis, C. (2006).. . Journal of Great
(2003). Who wins? Study of long-run trader survival Lakes Research, 32, 852–869. doi:10.3394/0380-
in an artificial stock market. Physica A, 324, 227–233. 1330(2006)32[852:WADCTM]2.0.CO;2
doi:10.1016/S0378-4371(02)01902-7
D’Espagnat, B. (1999). Conceptual Foundation of Quan-
Cliff, D., Harvey, I., & Husbands, P. (1993). Explora- tum mechanics (2nd ed.). Perseus Books.
tions in Evolutionary Robotics . Adaptive Behavior, 2(1),
d’Acremont, M., & Bossaerts, P. (2008, September).
71–104. doi:10.1177/105971239300200104
Grasping the fundamental difference between expected
Cliff, D., & Bruten, J. (1997). Zero is not enough: On the utility and mean-variance theories. Paper presented at
lower limit of agent intelligence for continuous double Annual Conference on Neuroeconomics, Park City, Utah.
auction markets (Technical Report no. HPL-97-141).
Das, R., Hanson, J. E., Kephart, J. O., & Tesauro, G.
Hewlett-Packard Laboratories. Retrieved January 1, 2008,
(2001). Agent-human interactions in the continuous
from http://citeseer.ist.psu.edu/cliff97zero.html
double auction. In Proceedings of the 17th International
CME 2007. (n.d.). Retrieved December 2008, from http:// Joint Conference on Artificial Intelligence (IJCAI), San
www.cme.com/trading/prd/re/housing.html. Francisco. CA: Morgan-Kaufmann.
Collins, R., & Jeffersion, D. (1991). AntFarm: Towards De Long, J. B., Shleifer, A. L., Summers, H., & Wald-
Simulated Evolution. In Artificial Life II, Proceedings mann, R. J. (1991). The survival of noise traders in
of the Second International Conference on Artificial Life financial markets. The Journal of Business, 64(1), 1–19.
(pp. 159–168). doi:10.1086/296523
325
Deneuburg, J., Goss, S., Franks, N., Sendova-Franks, Durlauf, S. N., & Young, H. P. (2001). Social Dynamics.
A., Detrain, C., & Chretien, L. (1991). The Dynamics of Brookings Institution Press.
Collective Sorting: Robot-Like Ant and Ant-Like Robot.
Easley, D., & Ledyard, J. (1993). Theories of price forma-
In Proceedings of First Conference on Simulation of Adap-
tion and exchange in double oral auction . In Friedman, D.,
tive Behavior: From Animals to Animats (pp. 356-363).
& Rust, J. (Eds.), The Double Auction Market-Institutions,
Cambridge: MIT Press.
Theories, and Evidence. Addison-Wesley.
Detterman, D. K., & Daniel, M. H. (1989). Correlations of
Economist.com. (2007). The world economy: Rocky
mental tests with each other and with cognitive variables
terrain ahead. Retrieved December 2008, from
are highest for low-IQ groups. Intelligence, 13, 349–359.
http://www.economist.com/ daily/news/displaystory.
doi:10.1016/S0160-2896(89)80007-8
cfm?storyid=9725432&top_story=1.
Devetag, G., & Warglien, M. (2003). Games and phone
Edmonds, B. (2002). Review of Reasoning about Ratio-
numbers: Do short-term memory bounds affect strategic
nal Agents by Michael Wooldridge. Journal of Artificial
behavior? Journal of Economic Psychology, 24, 189–202.
Societies and Social Simulation, 5(1). Retrieved from
doi:10.1016/S0167-4870(02)00202-7
http://jasss.soc.surrey.ac.uk/5/1/reviews/edmonds.html.
Devetag, G., & Warglien, M. (2008). Playing the wrong
Edmonds, B. (1998). Modelling socially intelligent
game: An experimental analysis of relational complexity
agents. Applied Artificial Intelligence, 12, 677–699.
and strategic misrepresentation. Games and Economic
doi:10.1080/088395198117587
Behavior, 62, 364–382. doi:10.1016/j.geb.2007.05.007
Ellis, C., Kenyon, I., & Spence, M. (1990). Occasional
Dimeas, A. L., & Hatziargyriou, N. D. (2007). Agent based
Publication of the London Chapter . OAS, 5, 65–124.
control of Virtual Power Plants. International Conference
on Intelligent Systems Applications to Power Systems. Elton, E., Gruber, G., & Blake, C. (1996). Survivorship
Bias and Mutual Fund Performance. Review of Financial
DiVincenzo, D. (1995). Quantum Computation. Science,
Studies, 9, 1097–1120. doi:10.1093/rfs/9.4.1097
270(5234), 255–261. doi:10.1126/science.270.5234.255
Epstein, J. M., & Axtell, R. (1996). Growing Artificial
DiVincenzo, D. (2000). The Physical Implementation
Societies Social Science From the The Bottom Up. MIT
of Quantum Computation. Experimental Proposals for
Press.
Quantum Computation. arXiv:quant-ph/0002077
Evolution Robotics Ltd. Homepage (2008). Retrieved
Dorigo, M., & Gambardella, L. M. (1996). Ant Colony
from http://www.evolution.com/
System: a Cooperative Learning Approach to the Traveling
Salesman . IEEE Transactions on Evolutionary Computa- Fagin, R., & Halpern, J. (1994). Reasoning about Knowl-
tion, 1(1), 53–66. doi:10.1109/4235.585892 edge and Probability. Journal of the ACM, 41(2), 340–367.
doi:10.1145/174652.174658
Dorigo, M., & Stutzle, T. (2004). Ant Colony Optimiza-
tion. Cambridge, MA: The MIT Press. Fair, R., & Jaffee, D. (1972). Methods of estimation for
markets in disequilibrium. Econometrica, 40, 497–514.
Dorigo, M., Maniezzo, V., & Colorni, A. (1991). Positive
doi:.doi:10.2307/1913181
Feedback as a Search Strategy (Technical Report No.
91-016). Politecnico di Milano. Fama, E. (1970). Efficient Capital Markets: A Review of
Theory and Empirical Work. The Journal of Finance, 25,
Duffy, J. (2006). Agent-based models and human subject
383–417. doi:10.2307/2325486
experiments . In Tesfatsion, L., & Judd, K. (Eds.), Hand-
book of Computational Economics (Vol. 2). North Holland.
326
Feldman, J. (1962). Computer simulation of cognitive Gigerenzer, G., & Selten, R. (2002). Bounded Rationality.
processes . In Broko, H. (Ed.), Computer applications Cambridge: The MIT Press.
in the behavioral sciences. Upper Saddle River, NJ:
Gjerstad, S., & Dickhaut, J. (1998). Price formation in
Prentice Hall.
double auctions. Games and Economic Behavior, 22,
Ferber, J. (1999). Multi Agent Systems. Addison Wesley. 1–29. doi:10.1006/game.1997.0576
Feynman, R. (1982). Simulating physics with computers. Gode, D. K., & Sunder, S. (1993). Allocative efficiency
International Journal of Theoretical Physics, 21, 467. of markets with zero-intelligence traders: markets as a
doi:10.1007/BF02650179 partial substitute for individual rationality. The Journal
of Political Economy, 101, 119–137. doi:10.1086/261868
Figner, B., Johnson, E., Lai, G., Krosch, A., Steffener,
J., & Weber, E. (2008, September). Asymmetries in Gode, D., & Sunder, S. (1993). Allocative efficiency
intertemporal discounting: Neural systems and the direc- of markets with zero-intelligence traders: Market as a
tional evaluation of immediate vs future rewards. Paper partial substitute for individual rationality. The Journal
presented at Annual Conference on Neuroeconomics, of Political Economy, 101, 119–137. doi:10.1086/261868
Park City, Utah.
Goldberg, D. E. (1989). Genetic Algorithms in Search,
Fischhoff, B. (1991). Value elicitation: Is there anything Optimization and Machine Learning. Addison-Wesley.
in there? The American Psychologist, 46, 835–847.
Gomez, F. J. and Miikkulainen, R. (1999). Solving
doi:10.1037/0003-066X.46.8.835
Non-Markovian Control Tasks with Neuroevolution, In
Flament, C. (1963). Applications of graphs theory to Proceedings of the International Joint Conference on
group structure. London: Prentice Hall. Artificial Intelligence (pp. 1356-1361).
Freddie Mac. (2008a). CMHPI data. Retrieved Decem- Gosavi, A. (2004). A Reinforcement Learning Algorithm
ber 2008, from http://www.freddiemac.com/finance/ Based on Policy Iteration for Average Reward: Em-
cmhpi/#old. pirical Results with Yield Management and Convergence
Analysis. Machine Learning, 55, 5–29. doi:10.1023/
Freddie Mac. (2008b). 30-year fixed rate historical Tables.
B:MACH.0000019802.64038.6c
Historical PMMS® Data. Retrieved December 2008,
from http://www.freddiemac.com/pmms/pmms30.htm. Gottfredson, L. S. (1997). Mainstream science on intel-
ligence: An editorial with 52 signatories, history, and
Frederick, S., Loewenstein, G., & O’Donoghue, T.
bibliography. Intelligence, 24(1), 13–23. doi:10.1016/
(2002). Time discounting and time preference: A critical
S0160-2896(97)90011-8
review. Journal of Economic Literature, XL, 351–401.
doi:10.1257/002205102320161311 Grefenstette, J. J. (1988). Credit Assignment in Rule Dis-
covery Systems Based on Genetic Algorithms. Machine
Friedman, D. (1991). A simple testable model of double
Learning, 3, 225–245. doi:10.1007/BF00113898
auction markets. Journal of Economic Behavior & Orga-
nization, 15, 47–70. doi:10.1016/0167-2681(91)90004-H Gregg, L., & Simon, H. (1979). Process models and
stochastic theories of simple concept formation. In H.
Friedman, M. (1953). Essays in Positive Economics.
Simon, Models of Thought (Vol. I). New Haven, CT:
University of Chicago Press.
Yale Uniersity Press.
Fudenberg, D., & Levine, D. (2006). A dual-self model of
impulse control. The American Economic Review, 96(5),
1449–1476. doi:10.1257/aer.96.5.1449
327
Grossklags, J., & Schmidt, C. (2006). Software agents and Hiroshi, I., & Masahito, H. (2006). Quantum Computation
market (in)efficiency—a human trader experiment. IEEE and Information. Berlin: Springer.
Transactions on System, Man, and Cybernetics: Part C .
Hisdal, E. (1998). Logical Structures for Representation
Special Issue on Game-theoretic Analysis & Simulation
of Knowledge and Uncertainty. Springer.
of Negotiation Agents, 36(1), 56–67.
Holland, J. H. (1975). Adaptation in Natural and Artificial
Group, C. M. E. (2007). S&P/Case-Shiller Price Index:
Systems. University of Michigan Press.
Futures and options. Retrieved December 2008, from
http://housingderivatives. typepad.com/housing_deriva- Hough, J. (1958). Geology of the Great Lakes. [Univ. of
tives/files/cme_housing _fact_sheet.pdf. Illinois Press.]. Urbana (Caracas, Venezuela), IL.
Gruber, M. J. (1996). Another Puzzle: The Growth in Housing Predictor. (2008). Independent real estate hous-
Actively Managed Mutual Funds. The Journal of Finance, ing forecast. Retrieved December 2008, from http://www.
51(3), 783–810. doi:10.2307/2329222 housingpredictor.com/ california.html.
Haji, K. (2007). Subprime mortgage crisis casts a Hunt, E. (1995). The role of intelligence in modern society.
global shadow – medium-term economic forecast (FY American Scientist, (July/August): 356–368.
2007~2017). Retrieved December 2008, from http://www.
Huynh, T. D., Jennings, N. R., & Shadbolt, N. R. (2006).
nli-research.co.jp/english/economics/2007/ eco071228.
An integrated trust and reputation model for open multi-
pdf.
agent systems. Journal of Autonomous agents and multi
Halpern, J. (2005). Reasoning about uncertainty. MIT agent systems.
Press.
Iacono, T. (2008). Case-Shiller® Home Price Index fore-
Hanaki, N. (2005). Individual and social learning. casts: Exclusive house-price forecasts based on Fiserv’s
Computational Economics, 26, 213–232. doi:10.1007/ leading Case-Shiller Home Price Indexes. Retrieved
s10614-005-9003-5 December 2008, from http://www.economy.com/home/
products/ case_shiller_indexes.asp.
Harmanec, D., Resconi, G., Klir, G. J., & Pan, Y. (1995).
On the computation of uncertainty measure in Dempster- Ingerson, T. E., & Buvel, R. L. (1984). Structure in Asyn-
Shafer theory. International Journal of General Systems, chronous Cellular Automata. Physica D. Nonlinear Phe-
25(2), 153–163. doi:10.1080/03081079608945140 nomena, 10, 59–68. doi:10.1016/0167-2789(84)90249-5
Harvey, I., Di Paolo, E., Wood, A., & Quinn, R., M., & Iwase, Y., Suzuki, R., & Arita, T. (2007). Evolutionary
Tuci, E. (2005). Evolutionary Robotics: A New Scientific Search for Cellular Automata that Exhibit Self-Organizing
Tool for Studying Cognition. Artificial Life, 11(3/4), Properties Induced by External Perturbations. In Proc.
79–98. doi:10.1162/1064546053278991 2007 IEEE Congress on Evolutionary Computation
(CEC2007) (pp. 759-765).
Harvey, I., Husbands, P., Cliff, D., Thompson, A., &
Jakobi, N. (1997). Evolutionary robotics: The sussex ap- Iyengar, S., & Lepper, M. (2000). When choice is demoti-
proach. Robotics and Autonomous Systems, 20, 205–224. vating: Can one desire too much of a good thing? Journal
doi:10.1016/S0921-8890(96)00067-X of Personality and Social Psychology, 79(6), 995–1006.
doi:10.1037/0022-3514.79.6.995
Hatziargyriou, N. D., Dimeas, A., Tsikalakis, A. G., Lopes,
J. A. P., Kariniotakis, G., & Oyarzabal, J. (2005). Manage- Jaakkola, T., Singh, S. P., & Jordan, M. I. (1994). Rein-
ment of Microgrids in Market Environment. International forcement Learning Algorithm for Partially Observable
Conference on Future Power Systems. Markov Decision Problems. Advances in Neural Informa-
tion Processing Systems, 7, 345–352.
328
Jaeger, G. (2006). Quantum Information: An Overview. Kahneman, D., Diener, E., & Schwarz, N. (Eds.). (2003).
Berlin: Springer. Well-Being: The Foundations of Hedonic Psychology.
New York, NY: Russell Sage Foundation.
Jamison, J., Saxton, K., Aungle, P., & Francis, D. (2008).
The development of preferences in rat pups. Paper pre- Kahneman, D., Knetsch, J., & Thaler, R. (1990). Experi-
sented at Annual Conference on Neuroeconomics, Park mental tests of the endowment effect and the Coase theo-
City, Utah. rem. The Journal of Political Economy, 98, 1325–1348.
doi:10.1086/261737
Jayantilal, A., Cheung, K. W., Shamsollahi, P., & Bre-
sler, F. S. (2001). Market Based Regulation for the PJM Kahneman, D., Knetsch, J., & Thaler, R. (1991). Anoma-
Electricity Market. IEEE International Conference on lies: The endowment effect, loss aversion, and status
Innovative Computing for Power Electric Energy Meets quo bias. The Journal of Economic Perspectives, 5(1),
the Markets (pp. 155-160). 193–206.
Jevons, W. (1879). The Theory of Political Economy, Kahneman, D., Ritov, I., & Schkade, D. (1999). Economic
2nd Edtion. Edited and introduced by R. Black (1970). preferences or attitude expressions? An analysis of dollar
Harmondsworth: Penguin. responses to public issues. Journal of Risk and Uncertainty,
19, 203–235. doi:10.1023/A:1007835629236
Johnson, E., Haeubl, G., & Keinan, A. (2007). Aspects
of endowment: A query theory account of loss aversion Kahneman, D. (2003). Maps of Bounded Rational-
for simple objects. Journal of Experimental Psychol- ity: Psychology for Behavioral Economics. The
ogy. Learning, Memory, and Cognition, 33, 461–474. American Economic Review, 93(5), 1449–1475.
doi:10.1037/0278-7393.33.3.461 doi:10.1257/000282803322655392
Johnson, N., Jeffries, P., & Hui, P. M. (2003). Financial Kahneman, D., & Tversky, A. (1979). Prospect Theory
Market Complexity. Oxford. of Decisions under Risk. Econometrica, 47, 263–291.
doi:10.2307/1914185
Jurca, R., & Faltings, B. (2003). Towards Incentive-
Compatible Reputation Management. Trust, Reputation Kahneman, D., & Tversky, A. (1992). Advances in. pros-
and Security: Theories and Practice (LNAI 2631, pp. pect Theory: Cumulative representation of Uncertainty.
138-147). Journal of Risk and Uncertainty, 5.
Kaboudan, M. (2001). Genetically evolved models Kaizoji. T, Bornholdt, S. & Fujiwara.Y. (2002). Dynam-
and normality of their residuals. Journal of Economic ics of price and trading volume in a spin model of stock
Dynamics & Control, 25, 1719–1749. doi:.doi:10.1016/ markets with heterogeneous agent. Physica A.
S0165-1889(00)00004-X
Kambayashi, Y., & Takimoto, M. (2005). Higher-Order
Kaboudan, M. (2004). TSGP: A time series genetic pro- Mobile Agents for Controlling Intelligent Robots. Inter-
gramming software. Retrieved December 2008, from national Journal of Intelligent Information Technologies,
http://bulldog2.redlands.edu/ fac/mak_kaboudan/tsgp. 1(2), 28–42.
Kagan, H. (2006). The Psychological Immune System: Kambayashi, Y., Sato, O., Harada, Y., & Takimoto, M.
A New Look at Protection and Survival. Bloomington, (2009). Design of an Intelligent Cart System for Common
IN: AuthorHouse. Airports. In Proceedings of 13th International Symposium
on Consumer Electronics. CD-ROM.
Kagel, J. (1995). Auction: A survey of experimental re-
search . In Kagel, J., & Roth, A. (Eds.), The Handbook
of Experimental Economics. Princeton University Press.
329
Kambayashi, Y., Tsujimura, Y., Yamachi, H., Takimoto, Kovalerchuk, B., & Vityaev, E. (2000). Data mining in fi-
M., & Yamamoto, H. (2009). Design of a Multi-Robot nance: advances in relational and hybrid methods. Kluwer.
System Using Mobile Agents with Ant Colony Cluster-
Kovalerchuk, B. (1990). Analysis of Gaines’ logic of
ing. In Proceedings of Hawaii International Conference
uncertainty, In I.B. Turksen (Ed.), Proceedings of NAFIPS
on System Sciences. IEEE Computer Society. CD-ROM
’90 (Vol. 2, pp. 293-295).
Kawamura, H., Yamamoto, M., & Ohuchi, A. (2001).
Koza, J. (1992). Genetic programming. Cambridge, MA:
(in Japanese). Investigation of Evolutionary Pheromone
The MIT Press.
Communication Based on External Measurement and
Emergence of Swarm Intelligence. Japanese Journal of Koza, J. R. (1992). Genetic Programming: On the Pro-
the Society of Instrument and Control Engineers, 37(5), gramming of Computers by Means of Natural Selection.
455–464. MIT Press.
Kawamura, H., & Yamamoto, M. Suzuki & Ohuchi, Krishna, V., & Ramesh, V. C. (1998). Intelligent
A. (1999). Ants War with Evolutive Pheromone Style agents for negotiations in market games. Part I. Model.
Communication. In Advances in Artificial Life, ECAL’99 IEEE Transactions on Power Systems, 1103–1108.
(LNAI 1674, pp. 639-643). doi:10.1109/59.709106
Kennedy, J., & Eberhert, R. C. (2001). Swarm Intelligence. Krishna, V., & Ramesh, V. C. (1998a). Intelligent agents
Morgan Kaufmann. for negotiations in market games. Part II. Application.
IEEE Transactions on Power Systems, 1109–1114.
Kepecs, A., Uchida, N., & Mainen, Z. (2008, September).
doi:10.1109/59.709107
How uncertainty boosts learning: Dynamic updating of
decision strategies. Paper presented at Annual Conference Kuhlmann, G., & Stone, P. (2003). Progress in learning
on Neuroeconomics, Park City, Utah. 3 vs. 2 keepaway. In Proceedings of the RoboCup-2003
Symposium.
Kimura, H., Yamamura, M., & Kobayashi, S. (1995).
Reinforcement Learning by Stochastic Hill Climbing on Kuhnen, C., & Chiao, J. (2008, September). Genetic
Discounted Reward. In Proceedings of the Twelfth Inter- determinants of financial risk taking. Paper presented at
national Conference on Machine Learning (pp. 295-303). Annual Conference on Neuroeconomics, Park City, Utah.
Klucharev, V., Hytonen, K., Rijpkema, M., Smidts, A., & Kwak, K. J., Baryshnikov, Y. M., & Coffman, E. G. (2008).
Fernandez, G. (2008, September). Neural mechanisms of Self-Organizing Sleep-Wake Sensor Systems. In Proc.
social decisions. Paper presented at Annual Conference the 2nd IEEE International Conference on Self-Adaptive
on Neuroeconomics, Park City, Utah. and Self-Organizing Systems (SASO2008) (pp. 393-402).
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Kyle, A. S., & Wang, A. (1997). Speculation Duopoly
Algorithms. Advances in Neural Information Processing with Agreement to Disagree: Can Overconfidence Survive
Systems, 12, 1008–1014. the Market Test? The Journal of Finance, 52, 2073–2090.
doi:10.2307/2329474
Koritarov, V. S. (2004). Real-World Market Representa-
tion with Agents (pp. 39–46). IEEE Power and Energy Laibson, D. (1997). Golden eggs and hyperbolic discount-
Magazine. ing. The Quarterly Journal of Economics, 12(2), 443–477.
doi:10.1162/003355397555253
Kovalerchuk, B. (1996). Context spaces as necessary
frames for correct approximate reasoning. Interna-
tional Journal of General Systems, 25(1), 61–80.
doi:10.1080/03081079608945135
330
Lam, K., & Leung, H. (2004). An Adaptive Strategy Lewis, C. (2007).. . Journal of Paleolimnology, 37,
for Trust/ Honesty Model in Multi-agent Semi- com- 435–452. doi:10.1007/s10933-006-9049-y
petitive Environments. In Proceedings of the 16 th
Lichtenstein, S., & Slovic, P. (Eds.). (2006). The Construc-
IEEE International Conference on Tools with Artificial
tion of Preference. Cambridge, UK: Cambridge University
Intelligence(ICTAI 2004)
Press. doi:10.1017/CBO9780511618031
Larsen, C. (1999). Cranbrook Institute of Science . Bul-
Lieberman, M. (2003). Reflective and reflexive judgment
letin, 64, 1–30.
processes: A social cognitive neuroscience approach . In
Lasseter, R., Akhil, A., Marnay, C., Stephens, J., Dagle, Forgas, J., Williams, K., & von Hippel, W. (Eds.), Social
J., Guttromson, R., et al. (2002, April). White paper on Judgments: Explicit and Implicit Processes (pp. 44–67).
Integration of consortium Energy Resources. The CERTS New York, NY: Cambridge University Press.
MicroGrid Concept. CERTS, CA, Rep.LBNL-50829.
Liepins, G. E., Hilliard, M. R., Palmer, M., & Rangarajan,
Le Baron, B. (2001). A builder’s guide to agent-based G. (1989). Alternatives for Classifier System Credit As-
financial markets. Quantitative Finance, 1(2), 254–261. signment. In Proceedings of the Eleventh International
doi:10.1088/1469-7688/1/2/307 Joint Conference on Artificial Intelligent (pp. 756-761).
LeBaron, B. (2000). Agent-based Computational Finance: Lin, C. C., & Liu, Y. T. (2008). Genetic Algorithms for
Suggested Readings and Early Research. Journal of Portfolio Selection Problems with Minimum Transaction
Economics & Control, 24, 679–702. doi:10.1016/S0165- Lots. European Journal of Operational Research, 185(1),
1889(99)00022-6 393–404. doi:10.1016/j.ejor.2006.12.024
LeBaron, B., Arthur, W. B., & Palmer, R. (1999). Time Lin, C.-H., Chiu, Y.-C., Lin, Y.-K., & Hsieh, J.-C. (2008,
Series Properties of an Artificial Stock Market. Journal September). Brain maps of Soochow Gambling Task. Pa-
of Economics & Control, 23, 1487–1516. doi:10.1016/ per presented at Annual Conference on Neuroeconomics,
S0165-1889(98)00081-5 Park City, Utah.
Lerner, J., Small, D., & Loewenstein, G. (2004). Heart Liu, Y., & Yao, X. (1996). A Population-Based Learning
strings and purse strings: Carry-over effects of emotions Algorithms Which Learns Both Architectures and Weights
on economic transactions. Psychological Science, 15, of Neural Networks. Chinese Journal of Advanced Soft-
337–341. doi:10.1111/j.0956-7976.2004.00679.x ware Research, 3(1), 54–65.
Levy, M., Levy, H., & Solomon, S. (2000). Microscopic Lo, A. (2005). Reconciling efficient markets with be-
Simulation of Financial Markets. Academic Press. havioral finance: The adaptive market hypothesis. The
Journal of Investment Consulting, 7(2), 21–44.
Levy, M. Levy, H., & Solomon, S. (2000). Microscopic
Simulation of Financial Markets: From Investor Behav- Loewenstein, G. (1988). Frames of mind in intertemporal
ior to Market Phenomena. San Diego: Academic Press. choice. Management Science, 34, 200–214. doi:10.1287/
mnsc.34.2.200
Lewandowsky, S., Oberauer, K., Yang, L.-X., & Ecker,
U. (2009). A working memory test battery for Matlab. Loewenstein, G. (2005). Hot-cold empathy gaps and
under prepartion for being submitted to the Journal of medical decision making. Health Psychology, 24(4),
Behavioral Research Method. S49–S56. doi:10.1037/0278-6133.24.4.S49
Lewis, C. (1994).. . Quaternary Science Reviews, 13,

891–922. doi:10.1016/0277-3791(94)90008-6
331
Loewenstein, G., & Schkade, D. (2003). Wouldn’t it Mamei, M., Roli, A., & Zambonelli, F. (2005). Emergence
be nice?: Predicting future feelings . In Kahneman, D., and Control of Macro-Spatial Structures in Perturbed
Diener, E., & Schwartz, N. (Eds.), Hedonic Psychology: Cellular Automata, and Implications for Pervasive Com-
The Foundations of Hedonic Psychology (pp. 85–105). puting Systems. IEEE Trans. Systems, Man, and Cyber-
New York, NY: Russell Sage Foundation. netics . Part A: Systems and Humans, 35(3), 337–348.
doi:10.1109/TSMCA.2005.846379
Loewenstein, G., & O’Donoghue, T. (2005). Animal
spirits: Affective and deliberative processes in economic Mamei, M. & Zambonelli, F. (2007). Pervasive pher-
behavior. Working Paper. Carnegie Mellon University, omone-based interaction with RFID tags. ACM Trans-
Pittsburgh. actions on Autonomous and Adaptive Systems (TAAS)
archive, 2(2).
Logenthiran, T., Srinivasan, D., & Wong, D. (2008).
Multi-agent coordination for DER in MicroGrid. IEEE Manson, S. M. (2006). Bounded rationality in agent-
International Conference on Sustainable Energy Tech- based models: experiments with evolutionary programs.
nologies (pp. 77-82). International Journal of Geographical Information Sci-
ence, 20(9), 991–1012. doi:10.1080/13658810600830566
Louie, K., Grattan, L., & Glimcher, P. (2008). Value-
based gain control: Relative reward normalization in Markowitz, H. (1952). Portfolio Selection. The Journal
parietal cortex. Paper presented at Annual Conference of Finance, 7, 77–91. doi:10.2307/2975974
Markowitz, H. (1987). Mean-Variance Analysis in
Lovis, W. (1989). Michigan Cultural Resource Investiga- Portfolio Choice and Capital Market. New York: Basil
tions Series 1, East Lansing. Blackwell.
Ludwig, A., & Torsten, S. (2001). The impact of stock Marr, C., & Hütt, M. T. (2006). Similar Impact of To-
prices and house prices on consumption in OECD coun- pological and Dynamic Noise on Complex Patterns.
tries. Retrieved December 2008, from http://www.vwl. Physics Letters. [Part A], 349, 302–305. doi:10.1016/j.
uni-mannheim.de/brownbag/ludwig.pdf. physleta.2005.08.096
Lumer, E. D., & Faieta, B. (1994). Diversity and Adapta- McCallum, R. A. (1995). Instance-Based Utile Distinc-
tion in Populations of Clustering Ants. In From Animals to tions for Reinforcement Learning with Hidden State. In
Animats 3: Proceedings of the 3rd International Conference Proceedings of the Twelfth International Conference on
on the Simulation of Adaptive Behavior (pp. 501-508). Machine Learning (pp. 387-395).
Cambridge: MIT Press.
McClure, S., Laibson, D., Loewenstein, G., & Cohen,
Lux, T., & Marchesi, M. (1999). Scaling and criticality J. (2004). Separate neural systems value immediate
in a stochastic multi-agent model of a financial market. and delayed monetary rewards. Science, 306, 503–507.
Nature, 397, 498–500. doi:10.1038/17290 doi:10.1126/science.1100907
MacLean, P. (1990). The Triune Brain in Evolution: Role Merrick, K., & Maher, M. L. (2007). Motivated Reinforce-
in Paleocerebral Function. New York, NY: Plenum Press. ment Learning for Adaptive Characters in Open-Ended
Simulation Games. In Proceedings of the International
Malkiel, B. (1995). Returns from Investing in Equity
Conference on Advanced in Computer Entertainment
Mutual Funds 1971 to 1991. The Journal of Finance, 50,
Technology (pp. 127-134).
549–572. doi:10.2307/2329419
332
Mitchell, M., Crutchfield, J. P., & Hraber, P. T. (1994). Mondada, F., & Floreano, D. (1995). Evolution of neural
Evolving Cellular Automata to Perform Computations: control structures: Some experiments on mobile robots.
Mechanisms and Impediments. Physica D. Nonlinear Phe- Robotics and Autonomous Systems, 16(2-4), 183–195.
nomena, 75, 361–391. doi:10.1016/0167-2789(94)90293- doi:10.1016/0921-8890(96)81008-6
3
Money, C. N. N. com (2008). World economy on thin ice
Miyazaki, K., & Kobayashi, S. (2001). Rationality of - U.N.: The United Nations blames dire situation on the
Reward Sharing in Multi-agent Reinforcement Learning. decline of the U.S. housing and financial sectors. Retrieved
New Generation Computing, 91, 157–172. doi:10.1007/ December 2008, from http://money.cnn.com/2008/05 /15/
BF03037252 news/ international/global_economy.ap/.
Miyazaki, K., & Kobayashi, S. (2003). An Extension of Montero, J., Gomez, D., & Bustine, H. (2007). On the
Profit Sharing to Partially Observable Markov Decision relevance of some families of fuzzy sets. Fuzzy Sets and
Processes: Proposition of PS-r* and its Evaluation. [in Systems, 16, 2429–2442. doi:10.1016/j.fss.2007.04.021
Japanese]. Journal of the Japanese Society for Artificial
Moody’s. Economy.com (2008). Case-Shiller® Home
Intelligence, 18(5), 286–296. doi:10.1527/tjsai.18.286
Price Index forecasts. Moody’s Analytics, Inc. Retrieved
Miyazaki, K., & Kobayashi, S. (1998). Learning Deter- December 2008, from http://www.economy.com/home/
ministic Policies in Partially Observable Markov Decision products/case_shiller_indexes.asp.
Processes. In Proceedings of the Fifth International Con-
Moore, T., Rea, D., Mayer, L., Lewis, C., & Dobson,
ference on Intelligent Autonomous System (pp. 250-257).
D. (1994).. . Canadian Journal of Earth Sciences, 31,
Miyazaki, K., & Kobayashi, S. (2000). Reinforcement 1606–1617. doi:10.1139/e94-142
Learning for Penalty Avoiding Policy Making. In Pro-
Murphy, R. R. (2000). Introduction to AI robotics. Cam-
ceedings of the 2000 IEEE International Conference on
bridge: MIT Press.
Systems, Man and Cybernetics (pp. 206-211).
Nagata, H., Morita, S., Yoshimura, J., Nitta, T., & Tainaka,
Miyazaki, K., Yamaumra, M., & Kobayashi, S. (1994).
K. (2008). Perturbation Experiments and Fluctuation
On the Rationality of Profit Sharing in Reinforcement
Enhancement in Finite Size of Lattice Ecosystems: Un-
Learning. In Proceedings of the Third International Con-
certainty in Top-Predator Conservation. Ecological Infor-
ference on Fuzzy Logic, Neural Nets and Soft Computing
matics, 3(2), 191–201. doi:10.1016/j.ecoinf.2008.01.005
(pp. 285-288).
Nagata, T., Takimoto, M., & Kambayashi, Y. (2009).
Modigliani, F., & Miller, M. H. (1958). The Cost of Capital,
Suppressing the Total Costs of Executing Tasks Using
Corporation Finance and the Theory of Investment. The
Mobile Agents. In Proceedings of the 42nd Hawaii Inter-
national Conference on System Sciences, IEEE Computer
Mohr, P., Biele, G., & Heekeren, H. (2008, September). Society. CD-ROM.
Distinct neural representations of behavioral risk and
Nakamura, M., & Kurumatani, K. (1997). Formation
reward risk. Paper presented at Annual Conference on
Mechanism of Pheromone Pattern and Control of Forag-
Neuroeconomics, Park City, Utah.
ing Behavior in an Ant Colony Model. In Proceedings
Monaghan, G., & Lovis, W. (2005). Modeling Archaeo- of the Fifth International Workshop on the Synthesis and
logical Site Burial in Southern Michigan. East Lansing, Simulation of Living Systems (pp. 67 -74).
MI: Michigan State Univ. Press.
333
National Association of Home Builders, The Hous- Orito, Y., Takeda, M., & Yamamoto, H. (2009). Index
ing Policy Department. (2005). The local impact of Fund Optimization Using Genetic Algorithm and Scatter
home building in a typical metropolitan area: In- Diagram Based on Coefficients of Determination. Studies
come, jobs, and taxes generated. Retrieved December in Computational Intelligence: Intelligent and Evolution-
2008, from http://www.nahb.org/fileUpload_details. ary Systems, 187, 1–11.
aspx?contentTypeID=3&contentID= 35601& subCon-
Orito, Y., & Yamamoto, H. (2007). Index Fund Optimi-
tentID=28002.
zation Using a Genetic Algorithm and a Heuristic Local
NeuroSolutionsTM (2002). The Neural Network Simula- Search Algorithm on Scatter Diagrams. In Proceedings
tion Environment. Version 3, NeuroDimensions, Inc., of 2007 IEEE Congress on Evolutionary Computation
Gainesville, FL. (pp. 2562-2568).
Ng, A. Y. Ng & Russell, S. (2000). Algorithms for Inverse Palmer, R. G., Arthur, W. B., Holland, J. H., LeBaron, B., &
Reinforcement Learning. In Proceedings of 17th Interna- Tayler, P. (1994). Artificial economic life: a simple model
tional Conference on Machine Learning (pp. 663-670). of a stock market. Physica D. Nonlinear Phenomena,
Morgan Kaufmann, San Francisco, CA. 75(1-3), 264–274. doi:10.1016/0167-2789(94)90287-9
Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy Palmer, R. G., Arthur, W. B., Holland, J. H., LeBaron,
Invariance Under Reward Transformations: Theory and B., & Tayler, P. (1994). Artificial economic life: A simple
Application to Reward Shaping. In Proceedings of the model of a stock market. Physica D. Nonlinear Phenom-
Sixteenth International Conference on Machine Learn- ena, 75, 264–274. doi:10.1016/0167-2789(94)90287-9
ing (pp. 278-287).
Patel, J. (2007). A Trust and Reputation Model For Agent-
Nielsen, M., & Chuang, I. (2000). Quantum Computa- Based Virtual Organizations. Phd thesis in the faculty of
tion and Quantum Information. Cambridge: Cambridge Engineering and Applied Science School of Electronics
University Press. and Computer Sciences University of South Hampton
January 2007.
Ninagawa, S., Yoneda, M., & Hirose, S. (1997). Cellular
Automata in Dissipative Boundary Conditions [in Japa- Paulsen, D., Huettel, S., Platt, M., & Brannon, E. (2008,
nese]. Transactions of Information Processing Society of September). Heterogeneity in risky decision making
Japan, 38(4), 927–930. in 6-to-7-year-old children. Paper presented at Annual
Ninagawa, S. (2005). Evolving Cellular Automata by 1/f
Noise. In Proc. the 8th European Conference on Artificial Payne, J., Bettman, J., & Johnson, E. (1993). The Adaptive
Life (ECAL2005) (pp. 453-460). Decision Maker. Cambridge University Press.
Oh, K. J., Kim, T. Y., & Min, S. (2005). Using Genetic Pearson, J., Hayden, B., Raghavachari, S., & Platt, M.
Algorithm to Support Portfolio Optimization for Index (2008) Firing rates of neurons in posterior cingulate
Fund Management . Expert Systems with Applications, cortex predict strategy-switching in a k-armed bandit task.
28, 371–379. doi:10.1016/j.eswa.2004.10.014 Paper presented at Annual Conference on Neuroeconom-
Ohkura, K., Yasuda, T., Kawamatsu, Y., Matsumura, Y.,
& Ueda, K. (2007). MBEANN: Mutation-Based Evolv- Perkins, T. J. (2002). Reinforcement Learning for POM-
ing Artificial Neural Networks. In Proceedings of the 9th DPs based on Action Values and Stochastic Optimization.
European Conference in Artificial Life (pp. 936-945). In Proceedings of the Eighteenth National Conference on
Artificial Intelligence (pp. 199-204).
334
Praça, I., Ramos, C., Vale, Z., & Cordeiro, M. (2003). Resconi, G., & Jain, L. (2004). Intelligent agents. Springer
MASCEM: A Multi agent System That Simulates Com- Verlag.
petitive Electricity Markets. IEEE International confer-
Resconi, G., Klir, G. J., Harmanec, D., & St. Clair, U.
ence on Intelligent Systems (pp. 54-60).
(1996). Interpretation of various uncertainty theories us-
Preuschoff, K., Bossaerts, P., & Quartz, S. (2006). ing models of modal logic: a summary. Fuzzy Sets and
Neural Differentiation of Expected Reward and Risk in Systems, 80, 7–14. doi:10.1016/0165-0114(95)00262-6
Human Subcortical Structures. Neuron, 51(3), 381–390.
Resconi, G., Klir, G. J., & St. Clair, U. (1992). Hierar-
doi:10.1016/j.neuron.2006.06.024
chical uncertainty metatheory based upon modal logic.
Priest, G., & Tanaka, K. Paraconsistent Logic. (2004). International Journal of General Systems, 21, 23–50.
Stanford Encyclopedia of Philosophy. http://plato.stan- doi:10.1080/03081079208945051
ford.edu/entries/logic-paraconsistent.
Resconi, G., Murai, T., & Shimbo, M. (2000). Field
Principe, J., Euliano, N., & Lefebvre, C. (2000). Neural Theory and Modal Logic by Semantic field to make
and Adaptive Systems: Fundamentals through Simula- Uncertainty Emerge from Information. Interna-
tions. New York: John Wiley & Sons, Inc. tional Journal of General Systems, 29(5), 737–782.
doi:10.1080/03081070008960971
Pushkarskaya, H., Liu, X., Smithson, M., & Joseph, J.
(2008, September). Neurobiological responses in individu- Resconi, G., & Turksen, I. B. (2001). Canonical Forms
als making choices in uncertain environments: Ambiguity of Fuzzy Truthoods by Meta-Theory Based Upon Modal
and conflict. Paper presented at Annual Conference on Logic. Information Sciences, 131, 157–194. doi:10.1016/
Neuroeconomics, Park City, Utah. S0020-0255(00)00095-5
Quinn, M., & Noble, J. (2001). Modelling Animal Behav- Resconi, G., & Kovalerchuk, B. (2006). The Logic of
iour in Contests: Tactics, Information and Communication. Uncertainty with Irrational Agents In Proc. of JCIS-
In Advances in Artificial Life: Sixth European Conference 2006 Advances in Intelligent Systems Research, Taiwan.
on Artificial Life (ECAL 01), (LNAI). Atlantis Press
Raberto, M., Cincotti, S., Focardi, M., & Marchesi, M. Resconi, G., Klir, G.J., St. Clair, U., & Harmanec, D.
(2001). Agent-based simulation of a financial market. (1993). The integration of uncertainty theories. Intern. J.
Physica A, 299(1-2), 320–328. doi:10.1016/S0378- Uncertainty Fuzziness knowledge-Based Systems, 1, 1-18.
4371(01)00312-0
Reynolds, R. G., & Ali, M. (2008). Computing with the
Rahman, S., Pipattanasomporn, M., & Teklu, Y. (2007). In- Social Fabric: The Evolution of Social Intelligence within
telligent Distributed Autonomous Power System (IDAPS). a Cultural Framework. IEEE Computational Intelligence
IEEE Power Engineering Society General Meeting. Magazine, 3(1), 18–30. doi:10.1109/MCI.2007.913388
Ramchurn, S. D., Huynh, D., & Jennings, N. R. (2004). Reynolds, R. G., Ali, M., & Jayyousi, T. (2008). Mining
Trust in multiagent Systems. The Knowledge Engineering the Social Fabric of Archaic Urban Centers with Cultural
Review, 19(1), 1–25. doi:10.1017/S0269888904000116 Algorithms. IEEE Computer, 41(1), 64–72.
Rangel, A., Camerer, C., & Montague, R. (2008). A Rocha, L. M. (2004). Evolving Memory: Logical Tasks
framework for studying the neurobiology of value-based for Cellular Automata. In Proc. the 9th International
decision making. Nature Reviews. Neuroscience, 9, Conference on the Simulation and Synthesis of Living
545–556. doi:10.1038/nrn2357 Systems (ALIFE9) (pp. 256-261).
335
Roli, A., & Zambonelli, F. (2002). Emergence of Macro Sato, O., Ugajin, M., Tsujimura, Y., Yamamoto, H., &
Spatial Structures in Dissipative Cellular Automata. Kambayashi, Y. (2007). Analysis of the Behaviors of
In Proc. the 5th International Conference on Cellular Multi-Robots that Implement Ant Colony Clustering
Automata for Research and Industry (ACRI2002) (pp. Using Mobile Agents. In Proceedings of the Eighth Asia
144-155). Pacific Industrial Engineering and Management System.
CD-ROM.
Roth, A. E., & Ockenfels, A. (2002). Last-minute bid-
ding and the rules for ending second-price auction: Satoh, I. (1999). A Mobile Agent-Based Framework for
evidence from Ebay and Amazon auctions on the Inter- Active Networks. In Proceedings of IEEE Systems, Man,
net. The American Economic Review, 92, 1093–1103. and Cybernetics Conference (pp. 161-168).
doi:10.1257/00028280260344632
Sauter, J., Matthews, R., Parunak, H., & Brueckner, S.
Ruspini, E. H. (1999). A new approach to clustering. (2005). Performance of digital pheromones for swarming
Information and Control, 15, 22–32. doi:10.1016/S0019- vehicle control. In Proceedings of the fourth international
9958(69)90591-9 joint conference on Autonomous agents and multiagent
systems (pp. 903-910).
Russell, S., & Norvig, P. (1995). Artificial Intelligence.
Prentice-Hall. Schlosser, A., Voss, M., & Bruckner, L. (2004). Com-
paring and evaluating metrics for reputation systems by
Rust, J., Miller, J., & Palmer, R. (1994). Characterizing
simulation. Paper presented at RAS-2004, A Workshop
effective trading strategies: Insights from a computer-
on Reputation in Agent Societies as part of 2004 IEEE/
ized double auction tournament. Journal of Economic
WIC/ACM International Joint Conference on Intelligent
Dynamics & Control, 18, 61–96. doi:10.1016/0165-
Agent Technology (IAT’04) and Web Intelligence (WI’04),
1889(94)90069-8
Beijing China, September 2004.
Rust, J., Miller, J., & Palmer, R. (1993). Behavior of trad-
Schwartz, B. (2003). The Paradox of Choice: Why More
ing automata in a computerized double auction market . In
Is Less. New York, NY: Harper Perennial.
Friedman, D., & Rust, J. (Eds.), Double Auction Markets:
Theory, Institutions, and Laboratory Evidence. Redwood Shahidehpour, M., & Alomoush, M. (2001). Restructured
City, CA: Addison Wesley. Electrical Power Systems: Operation, Trading, and Vola-
tility. Marcel Dekker Inc.
Rutledge, R., Dean, M., Caplin, A., & Glimcher, P. (2008,
September). A neural representation of reward predic- Shahidehpour, M., Yamin, H., & LI Z. (2002). Market
tion error identified using an axiomatic model. Paper Operations in Electric Power Systems: Forecasting,
presented at Annual Conference on Neuroeconomics, Scheduling, and Risk Management. Wiley-IEEE Press.
Park City, Utah.
Shannon, C., & Weaver, W. (1964). The Mathematical
Sakai, S., Nishinari, K., & Iida, S. (2006). A New Theory of Communication. The University of Illinois Press.
Stochastic Cellular Automaton Model on Traffic Flow
Sharot, T., De Martino, B., & Dolan, R. (2008, September)
and Its Jamming Phase Transition. Journal of Physics.
Choice shapes, and reflects, expected hedonic outcome.
A, Mathematical and General, 39(50),15327–15339.
Paper presented at Annual Conference on Neuroeconom-
doi:10.1088/0305-4470/39/50/002
Samanez Larkin, G., Kuhnen, C., & Knutson, B. (2008).
Sharpe, W. F. (1964). Capital Asset Prices: A Theory of
Financial decision making across the adult life span. Pa-
Market Equilibrium under condition of Risk. The Journal
per presented at Annual Conference on Neuroeconomics,
of Finance, 19, 425–442. doi:10.2307/2977928
Park City, Utah.
336
Sheely, T. (1995). The Wisdom of the Hive: The Social Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learn-
Physiology of Honey Bee Colonies. Harvard University ing Without State-Estimation in Partially Observable
Press. Markovian Decision Processes. In Proceedings of the
Eleventh International Conference on Machine Learning
Shiller, R. J. (2000). Irrational Exuberance. Princeton
(pp. 284-292).
University Press.
Slovic, P. (1995). The construction of preference. The
Shleifer, A. (2000). Inefficient Markets. Oxford University
American Psychologist, 50, 364–371. doi:10.1037/0003-
Press. doi:10.1093/0198292279.001.0001
066X.50.5.364
Shott, M. (1999). Cranbrook Institute of Science . Bul-
Smith, V. (1976). Experimental economics: induced value
letin, 64, 71–82.
theory. The American Economic Review, 66(2), 274–279.
Shrestha, G. B., Song, K., & Goel, L. K. (2000). An Ef-
Sole, R., Bonabeau, E., Delgado, J., Fernandez, P., &
ficient Power Pool Simulator for the Study of Competi-
Marin, J. (2000). Pattern Formation and Optimization
tive Power Market. Power Engineering Society Winter
in Army Raids. [The MIT Press.]. Artificial Life, 6(3),
Meeting.
219–226. doi:10.1162/106454600568843
Simon, H. (1955). A behavioral model of rational choice.
Sornette, D. (2003). Why stock markets crash. Princeton
The Quarterly Journal of Economics, 69, 99–118.
University Press.
doi:10.2307/1884852
Standard & Poor’s. (2008a). S&P/Case-Shiller® Home
Simon, H. (1956). Rational choice and the structure of
Price Indices Methodology. Standard & Poor’s. Retrieved
the environment. Psychological Review, 63, 129–138.
December 2008, from http://www2.standardandpoors.
doi:10.1037/h0042769
com/spf/pdf/index/SP_CS_Home_ Price_Indices_ Meth-
Simon, H. (1965). The architecture of complexity. General odology_Web.pdf.
Systems, 10, 63–76.
Standard & Poor’s. (2008b). S&P/Case-Shiller Home Price
Simon, H. (1981). Studying human intelligence by creating Indices. Retrieved December 2008, from http://www2.
artificial intelligence. American Scientist, 69, 300–309. standardandpoors.com/ portal/site/sp/en/us/page.topic/
indices_csmahp/ 2,3,4,0,0,0,0,0,0,1,1,0,0,0,0,0.html.
Simon, H. (1996). The Sciences of the Artificial. Cam-
bridge, MA: MIT Press. Stanley, K., & Miikkulainen, R. (2002). Evolv-
ing neural networks through augmenting topolo-
Simon, H. A. (1997). Behavioral economics and bounded
gies . Evolutionary Computation, 10(2), 99–127.
rationality . In Simon, H. A. (Ed.), Models of Bounded
doi:10.1162/106365602320169811
Rationality (pp. 267–298). MIT Press.
Stark, T. (2008). Survey of professional forecasters: May
Simon, H. (2005). Darwinism, altruism and econom-
13, 2008. Federal Reserve Bank of Philadelphia. Retrieved
ics. In: K. Dopfer (Ed.), The Evolutionary Foundations
December 2008, from http://www.philadelphiafed.org/
of Economics (89-104), Cambridge, UK: Cambridge
files/spf/survq208.html
University Press.
Stolze, J., & Suter, D. (2004). Quantum Computing.
Singh, S. P., & Sutton, R. S. (1996). Reinforcement Learn-
Wiley-VCH. doi:10.1002/9783527617760
ing with Replacing Eligibility Traces. Machine Learning,
22(1-3), 123–158. doi:10.1007/BF00114726
337
Stone, P., Sutton, R. S., & Kuhlmann, G. (2005). Suzuki, K., & Ohuchi, A. (1997). Reorganization of
Reinforcement Learning for RoboCup Soccer Agents with Pheromone Style Communication in Mulltiple
Keepaway. Adaptive Behavior, 13(3), 165–188. Monkey Banana Problem. In . Proceedings of Intelligent
doi:10.1177/105971230501300301 Autonomous Systems, 5, 615–622.
Stone, P., & Sutton, R. S. (2002). Keepaway Soccer: a Takahashi, H., & Terano, T. (2003). Agent-Based Ap-
machine learning testbed . In Birk, A., Coradeschi, S., & proach to Investors’ Behavior and Asset Price Fluctuation
Tadokoro, S. (Eds.), RoboCup-2001: Robot Soccer World in Financial Markets. Journal of Artificial Societies and
Cup V (pp. 214–223). doi:10.1007/3-540-45603-1_22 Social Simulation, 6(3).
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. (2006). Takahashi, H., & Terano, T. (2004). Analysis of Micro-
Keepaway Soccer: From Machine Learning Testbed to Macro Structure of Financial Markets via Agent-Based
Benchmark . In Noda, I., Jacoff, A., Bredenfeld, A., & Model: Risk Management and Dynamics of Asset Pricing.
Takahashi, Y. (Eds.), RoboCup-2005: Robot Soccer World Electronics and Communications in Japan, 87(7), 38–48.
Cup IX. Berlin: Springer Verlag. doi:10.1007/11780519_9
Takahashi, H., Takahashi, S., & Terano, T. (2007). Ana-
Streichert, F., & Tanaka-Yamawaki, M. (2006). The Effect lyzing the Influences of Passive Investment Strategies
of Local Search on the Constrained Portfolio Selection on Financial Markets via Agent-Based Modeling . In
Problem. In Proceedings of 2006 IEEE Congress on Edmonds, B., Hernandez, C., & Troutzsch, K. G. (Eds.),
Evolutionary Computation (pp. 2368-2374). Social Simulation- Technologies, Advances, and New
Discoveries (pp. 224–238). Hershey, PA: Information
Sueyoshi, T., & Tadiparthi, G. R. (2007). Agent-based
Science Reference.
approach to handle business complexity in U.S. wholesale
power trading. IEEE Transactions on Power Systems, Takahashi, H. (2010), “An Analysis of the Influence of
532–543. doi:10.1109/TPWRS.2007.894856 Fundamental Values’ Estimation Accuracy on Financial
Markets, ” Journal of Probability and Statistics, 2010.
Sun, R., & Qi, D. (2001). Rationality Assumptions and
Optimality of Co-learning, In Design and Applications Takahashi, H., & Terano, T. (2006a). Emergence of
of Intelligent Agents (LNCS 1881, pp. 61-75). Berlin/ Overconfidence Investor in Financial markets. 5th In-
Heidelberg: Springer. ternational Conference on Computational Intelligence
in Economics and Finance.
Sutton, R., & Barto, A. G. (1998). Reinforcement Learning:
An Introduction. Cambridge, MA: MIT Press. Takahashi, H., & Terano, T. (2006b). Exploring Risks of
Financial Markets through Agent-Based Modeling. In
Sutton, R. S. (1988). Learning to Predict by the Methods
Proc. SICE/ICASS 2006 (pp. 939-942).
of Temporal Differences. Machine Learning, 3, 9–44.
doi:10.1007/BF00115009 Takimoto, M., Mizuno, M., Kurio, M., & Kambayashi,
Y. (2007). Saving Energy Consumption of Multi-Robots
Sutton, R. S., & Barto, A. (1998). Reinforcement Learning:
Using Higher-Order Mobile Agents. In Proceedings of
An Introduction. Cambridge, MA: MIT Press.
the First KES International Symposium on Agent and
Sutton, R. S., McAllester, D., Singh, S. P., & Mansour, Multi-Agent Systems: Technologies and Applications
Y. (2000). Policy Gradient Methods for Reinforcement (LNAI 4496, pp. 549-558).
Learning with Function Approximation. Advances in
Neural Information Processing Systems, 12, 1057–1063.
338
Taniguchi, K., Nakajima, Y., & Hashimoto, F. (2004). U.S. Bureau of Economic Analysis. (2008). Regional
A report of U-Mart experiments by human agents . In economic accounts: State personal income. Retrieved
Shiratori, R., Arai, K., & Kato, F. (Eds.), Gaming, Simula- December 2008, from http://www.bea.gov/regional/sqpi/
tions, and Society: Research Scope and Perspective (pp. default.cfm?sqtable=SQ1.
49–57). Springer.
U.S. Census Bureau. (2008a). Housing vacancies and
Terano, T., Nishida, T., Namatame, A., Tsumoto, S., home ownership. Retrieved December 2008, from http://
Ohsawa, Y., & Washio, T. (Eds.). (2001). New Frontiers www.census.gov/hhes/ www/histt10.html.
in Artificial Intelligence. Springer Verlag. doi:10.1007/3-
U.S. Census Bureau. (2008b). New residential construc-
540-45548-5
tion. Retrieved December 2008, from http://www.census.
Terano, T. (2007a). Exploring the Vast Parameter Space of gov/const/www/newresconstindex_excel.html.
Multi-Agent Based Simulation. In L. Antunes & K. Taka-
Ugajin, M., Sato, O., Tsujimura, Y., Yamamoto, H., Ta-
dama (Eds.), Proc. MABS 2006 (LNAI 4442, pp. 1-14).
kimoto, M., & Kambayashi, Y. (2007). Integrating Ant
Terano, T. (2007b). KAIZEN for Agent-Based Model- Colony Clustering Method to Multi-Robots Using Mobile
ing. In S. Takahashi, D. Sallach, & J. Rouchier (Eds.), Agents. In Proceedings of the Eigth Asia Pacific Industrial
Advancing Social Simulation -The First Congress- (pp. Engineering and Management System. CD-ROM.
1-6). Springer Verlag.
van Dinther, C. (2007). Adaptive Bidding in Single-Sided
Terano, T., Deguchi, H., & Takadama, K. (Eds.). (2003), Auctions under Uncertainty: An Agent-based Approach
Meeting the Challenge of Social Problems via Agent-Based in Market Engineering (Whitestein Series in Software
Simulation: Post Proceedings of The Second International Agent Technologies and Autonomic Computing). Basel:
Workshop on Agent-Based Approaches in Economic and Birkhäuser.
Social Complex Systems. Springer Verlag.
Vandersypen, L.M.K., Yannoni, C.S., & Chuang, I.L.
Tesfatsion, L. (2002). Agent-based computational eco- (2000). Liquid state NMR Quantum Computing.
nomics: Growing economies from the bottom up. Artificial
Vanstone, B., & Finnie, G. (2007). An empirical method-
Life, 8, 55–82. doi:10.1162/106454602753694765
ology for developing stockmarket trading systems using
Thomas, R., Kemp, A., & Lewis, C. (1973).. . Canadian artificial neural networks. Retrieved December 2008,
Journal of Earth Sciences, 10, 226–271. from http://epublications.bond.edu.au/cgi/ viewcontent.
cgi? article=1022&context=infotech_pubs.
Toyoda, Y., & Yano, F. (2004). Optimizing Movement of a
Multi-Joint Robot Arm with Existence of Obstracles Using Von-Wun Soo. (2000). Agent Negotiation under Uncer-
Multi-Purpose Genetic Algorithm. Industrial Engineering tainty and Risk In Design and Applications of Intelligent
and Management Systems, 3(1), 78–84. Agents (LNCS 1881, pp. 31-45). Berlin/Heidelberg:
Springer.
Triani, V., et al. (2007). From Solitary to Collective
Behaviours: Decision Making and Cooperation, In Pro- Wang, T., & Zhang, H. (2004). Collective Sorting with
ceedings of the 9th European Conference in Artificial Multi-Robot. In Proceedings of the First IEEE Inter-
Life (pp. 575-584). national Conference on Robotics and Biomimetics (pp.
716-720).
Tsang, E., Li, J., & Butler, J. (1998). EDDIE beats
the bookies. Int. J. Software. Practice and Experi- Warner, G., Hebda, R., & Hahn, B. (1984).. . Palaeogeog-
ence, 28, 1033–1043. doi:10.1002/(SICI)1097- raphy, Palaeoclimatology, Palaeoecology, 45, 301–345.
024X(199808)28:10<1033::AID-SPE198>3.0.CO;2-1 doi:10.1016/0031-0182(84)90010-5
339
Warren, M. (1994). Stock price prediction using genetic Wooldridge, M. (2000). Reasoning about Rational Agents.
programming . In Koza, J. (Ed.), Genetic Algorithms at Cambridge, MA: The MIT Press.
Stanford 1994. Stanford, CA: Stanford Bookstore.
Wu, W., Ekaette, E., & Far, B. H. (2003). Uncertainty
Watkins, C. J. H., & Dayan, P. (1992). Techni- Management Framework for Multi-Agent System, Pro-
cal note: Q-learning . Machine Learning, 8, 55–68. ceedings of ATS http://www.enel.ucalgary.ca/People/far/
doi:10.1023/A:1022676722315 pub/papers/2003/ATS2003-06.pdf
Weber, E., Johnson, E., Milch, K., Chang, H., Brodscholl, Xia, Y., Liu, B., Wang, S., & Lai, K. K. (2000). A Model
J., & Goldstein, D. (2007). Asymmetric discounting for Portfolio Selection with Order of Expected Returns.
in intertemporal choice: A query-theory account. Psy- Computers & Operations Research, 27, 409–422.
chological Science, 18, 516–523. doi:10.1111/j.1467- doi:10.1016/S0305-0548(99)00059-3
9280.2007.01932.x
Yao, X. (1999). Evolving artificial networks. Proceedings
Weber, B., Schupp, J., Reuter, M., Montag, C., Siegel, of the IEEE, 87(9), 1423–1447. doi:10.1109/5.784219
N., Dohmen, T., et al. (2008). Combining panel data
Yeh, C.-H., & Chen, S.-H. (2001). Market diversity and
and genetics: Proof of principle and first results. Paper
market efficiency: The approach based on genetic pro-
presented at Annual Conference on Neuroeconomics,
gramming. Journal of Artificial Simulation of Adaptive
Park City, Utah.
Behavior, 1(1), 147–165.
Williams, R. J. (1992). Simple Statistical Gradient Fol-
Zacharia, G., & Maes, P. (2000). Trust management through
lowing Algorithms for Connectionist Reinforcement
reputation mechanisms. Applied Artificial Intelligence Jour-
Learning. Machine Learning, 8, 229–256. doi:10.1007/
nal, 14(9), 881–908. doi:10.1080/08839510050144868
BF00992696
Zhan, W., & Friedman, D. (2007). Markups in double auc-
Wolfram, S. (2002). A New Kind of Science. Wolfram
tion markets. Journal of Economic Dynamics & Control,
Media Inc.
31, 2984–3005. doi:10.1016/j.jedc.2006.10.004
Wood, W., & Kleb, W. (2002). Extreme Programming in a
research environment . In Wells, D., & Williams, L. (Eds.),
XP/Agile Universe 2002 (pp. 89–99). doi:10.1007/3-540-
45672-4_9
340
341
About the Contributors
Shu-Heng Chen is a professor in the Department of Economics and Director of Center of Interna-
tional Education and Exchange at the National Chengchi University. He also serves as the Director of
the AI-ECON Research Center, National Chengchi University, the editor- in-chief of the Journal of New
Mathematics and Natural Computation (World Scientific), the associate editor of the Journal of Economic
Behavior and Organization, and the editor of the Journal of Economic Interaction and Coordination. Dr.
Chen holds an M.A. degree in mathematics and a Ph. D. in Economics from the University of California
at Los Angeles. He has more than 150 publications in international journals, edited volumes and confer-
ence proceedings. He has been invited to give keynote speeches and plenary talks on many international
conferences. He is also the editor of the volume “Evolutionary Computation in Economics and Finance”
(Plysica-Verlag, 2002), “Genetic Algorithms and Genetic Programming in Computational Finance”
(Kluwer, 2002), and the co-editor of the Volume I & II of “Computational Intelligence in Economics
and Finance” (Springer-Verlag, 2002 & 2007), “Multi-Agent for Mass User Support” (Springer-Verlag,
2004), “Computational Economics: A Perspective from Computational Intelligence” (IGI publisher,
2005), and “Simulated Evolution and Learning,” Lecture Notes in Computer Science, ( LNCS 4247)
(Springer, 2006), as well as the guest editor of Special Issue on Genetic Programming, International
Journal on Knowledge Based Intelligent Engineering Systems (2008). His research interests are mainly
on the applications of computational intelligence to the agent-based computational economics and finance
as well as experimental economics. Details of Shu-Heng Chen can be found at http://www.aiecon.org/
or http://www.aiecon.org/staff/shc/E_Vita.htm.
Yasushi Kambayashi is an associate professor in the Department of Computer and Information En-
gineering at the Nippon Institute of Technology. He worked at Mitsubishi Research Institute as a staff
researcher before joining the Institute. His research interests include theory of computation, theory and
practice of programming languages, and political science. He received his PhD in Engineering from
the University of Toledo, his MS in Computer Science from the University of Washington, and his BA
in Law from Keio University. He is a committee member of IARIA International Multi-Conference on
Computing in the Global Information Technology and IARIA International Conference on Advances in
P2P Systems, a review committee member of Peer-to-Peer Networking and Applications, and a member
of Tau Beta Pi, ACM, IEEE Computer Society, IPSJ, JSSST, IEICE System Society, IADIS, and Japan
Flutist Association.
Hiroshi Sato is Assistant Professor of Department of Computer Science at National Defense Acad-
emy in Japan. He was previously Research Associate at Department of Mathematics and Information
Sciences at Osaka Prefecture University in Japan. He holds the degrees of Physics from Keio Univer-
sity in Japan, and Master and Doctor of Engineering from Tokyo Institute of Technology in Japan. His
research interests include agent-based simulation, evolutionary computation, and artificial intelligence.
He is a member of Japanese Society for Artificial Intelligence, and Society for Economic Science with
Heterogeneous Interacting Agents.
***
Akira Namatame is a Professor of Dept. of Computer Science, National Defense Academy of Japan.
He holds the degrees of Engineering in Applied Physics from National Defense Academy, Master of
Science in Operations Research and Ph.D, in Engineering-Economic System from Stanford University.
His research interests include Multi-agents, Game Theory, Evolution and Learning, Complex Networks,
Economic Sciences with Interaction Agents and A Science of Collectives.
Boris Kovalerchuk is a Professor of Computer Science at Central Washington University, USA. He

received his Ph.D in Applied Mathematics and Computer Science in 1977 from the Russian Academy of
Sciences. His research interests are in the agent theory, uncertainty and logic, neural networks, many-
valued logic, fuzzy sets, data mining, machine learning, visual and spatial analytics, and applications in
a variety of fields. Dr. Kovalerchuk published two books (Springer 2000, 2005) and multiple papers in
these fields and chaired two International Computational Intelligence Conferences. He is collaborating
with researchers at US National Laboratories, Industry, and Universities in the US, Russia, Italy, and UK.
Chung-Ching Tai received his Ph.D. degree in Economics from National Chengchi University,
Taiwan, R.O.C. in 2008. He conducted his post-doctoral studies in AI-ECON Research Center, National
Chengchi University, under Dr. Shu-Heng Chen from 2008 to 2009. He is currently an assistant profes-
sor in the Department of Economics at Tunghai University, Taiwan, R.O.C.
Dipti Srinivasan obtained her M.Eng. and Ph.D. degrees in Electrical Engineering from the National
University of Singapore (NUS) in1991and1994, respectively. She worked at the University of California
at Berkeley’s Computer Science Division as a post-doctoral researcher from 1994 to1995. In June 1995,
she joined the faculty of the Electrical and Computer Engineering department at the National Univer-
sity of Singapore, where she is an associate professor. From 1998 to 1999, she was a visiting faculty in
the Department of Electrical and Computer Engineering at the Indian Institute of Science, Bangalore,
India. Her main areas of interest are neural networks, evolutionary computation, intelligent multi-agent
systems and application of computational intelligence techniques to engineering optimization, plan-
ning and control problems in intelligent transportation systems and power systems. Dipti Srinivasan
is a senior member of IEEE and a member of IES, Singapore. She has published over 160 technical
papers in international refereed journals and conferences. She currently serves as an associate editor of
IEEE Transactions on Neural Networks, a social editor of IEEE Transactions Intelligent Transportation
Systems, area editor of International Journal of Uncertainty, Fuzziness and Knowledge-based Systems,
and as a managing guest editor of Neurocomputing.
342
Farshad Fotouhi received his Ph.D. in computer science from Michigan State University in 1988. Â€He
joined the faculty of Computer Science at Wayne State University in August 1988 where he is currently
Professor and Chair of the department. Dr. Fotouhi’s major areas of research include xml databases,
semantic web, multimedia systems, and biocomputing. Â€ He has published over 100 papers in refereed
journals and conference proceedings, served as program committee member of various database related
conferences. Dr. Fotouhi is on the Editorial Boards of the IEEE Multimedia Magazine and the Interna-
tional Journal on Semantic Web and Information Systems and he serves as a member of the Steering
Committee of the IEEE Transactions on Multimedia.
Germano Resconi is a Professor of Artificial Intelligence at the Department of Mathematics and

Physics at the Catholic University in Brecia, Italy. He received his Ph.D degree in physics from the
University of Milan in 1968. His research interests have led him to participate in a variety of activities,
both in the theoretical and applied studies in the agent theory, uncertainty and logic, neural networks,
morphic computing, many-valued logic, robotics, fuzzy sets, modal logic, quantum mechanics, quan-
tum computation and tensor calculus. He is collaborating with researchers at the University of Pisa,
University of Berkeley, University of Beijing, and other institutions.
Guy Meadows has been a faculty member at the University of Michigan since 1977. His areas of
research include; field and analytical studies of marine environmental hydrodynamics with emphasis
on mathematical modeling of nearshore waves, currents and shoreline evolution, active microwave
remote sensing of ocean dynamics including wave/wave, wave/current, wave/topographic interactions
with recent work in signatures of surface ship wakes and naturally occurring ocean surface processes
and the development of in situ and remote oceanographic instrumentation and data acquisition systems
designed to measure the spatial and temporal structure of coastal boundary layer flows.
Hafiz Farooq Ahmad is an Associate Professor at School of Electrical Engineering and Computer
Science (SEECS), NUST Islamabad Pakistan and also has joint appointment as Consultant Engineer in
DTS Inc, Tokyo, Japan. He received PhD from Tokyo Institute of Technology in 2002 under the supervi-
sion of Prof. Kinji Mori. His main research topics are autonomous decentralized systems, multi-agent
systems, autonomous semantic grid and semantic web. He is a member of IEEE.
Hidemi Yamachi is an assistant professor in the Department of Computer and Information Engi-
neering from the Nippon Institute of Technology, Japan. His research interests include optimization
methods based on evolutional computation and visualization. He received his PhD from Tokyo Met-
ropolitan University.
Hidenori Kawamura is an associate professor in the graduate school of Information Science and
Technology,Â€ Hokkaido University, Japan. His research interests include information science, complex
systems and multi-agent systems. He recieved his PhD, MS and BA in Information Engineering from
Hokkaido University. Contact him at Graduate school of Infoarmation Science and Technology, Hokkaido
University, Noth14 West9, Sapporo, Hokkaido, 060-0814, Japan; [email protected]
Hiroki Suguri is Professor of Information Systems at School of Project Design, Miyagi University,
where he teaches systems design, object-oriented modeling, Java programming and information literacy.
Professor Suguri received his Ph.D. in software information systems from Iwate Prefectural University
343
in 2004. His research interest includes multi-agent systems, semantic grid/cloud infrastructure, man-
agement information systems, and computer-aided education of information literacy.
Hiroshi Takahashi is a associate professor at Graduate School of Business Administration, Keio

University. He received his BA in 1994 from the University of Tokyo,Japan, MS and PhD degrees in
2002 and 2004 from the University of Tsukuba,Japan. He worked as a research scientist at Miyanodai
technology development center of Fuji Photofilm co.,ltd., and Mitsui Asset Trust and Banking co.,ltd.
His research interests include Finance, Financial Engineering, Time Series Analysis, Decision Theory,
and Genetic Algorithm-based machine learning.
Hisashi Yamamoto is an associate professor in the Department of System Design at the Tokyo
Metropolitan University. He received a BS degree, a MS degree, and a Dr. Eng. in Industrial Engineer-
ing from Tokyo Institute of Technology, Japan. His research interests include reliability engineering,
operations research, and applied statistics. He received best paper awards from REAJ and IEEE Reli-
ability Society Japan Chapter.
John O’Shea is a Professor of Anthropology and Curator of Great Lakes Archaeology in the Museum
of Anthropology. He earned his Ph.D. in Prehistoric Archaeology from Cambridge University in 1978.
His research focused on the ways in which the archaeological study of funerary customs could be used
to recover information on the social organization of past cultures. O’Shea maintains active research
interests in Eastern Europe and North America. His topical interests include: tribal societies, prehis-
toric ecology and economy, spatial analysis, ethnohistory, Native North America and later European
Prehistory. His research in Native North America focuses on the late pre-contact and contact periods in
the Upper Great Lakes and the Great Plains. In Europe, his research centers on the eastern Carpathian
Basin region of Hungary, Romania and northern Yugoslavia during the later Neolithic through Bronze
Age. Most recently, he has begun a program of research focused on the study of Nineteenth Century
shipwrecks in the Great Lakes. In addition, O’Shea directs a series of programs in local archaeology,
including the Archaeology in an Urban Setting project within the City of Ann Arbor, and the Vanishing
Farmlands Survey in Washtenaw County. Within the profession, O’Shea is the editor-in-chief of the
Journal of Anthropological Archaeology (Academic Press). He has also been active in the implementa-
tion of the Native American Grave Protection and Repatriation Act (NAGPRA) and was appointed in
1998 to a six-year term on the NAGPRA Review Committee by the Secretary of the Interior.
Kazuhiro Ohkura received a PhD degree in engineering from Hokkaido University in 1997. He
is currently a professor in the graduate school of Mechanical Systems Engineering at Hiroshima Uni-
versity, Japan, and the leader of Manufacturing Systems Laboratory. His research interests include
evolutionary algorithms, reinforcement learning and multiagent systems.
Keiji Suzuki is a professor in the graduate school of Information Science and Technology, Hokkaido
University, Japan. His research interests include information science, complex systems and multi-agent
systems. He recieved his PhD, MS and BA in Precision Engineering from Hokkaido University. Contact
him at Graduate school of Infoarmation Science and Technology, Hokkaido University, Noth14 West9,
Sapporo, Hokkaido, 060-0814, Japan; [email protected]
344
Mak Kaboudan is full professor of statistics in the School of Business, University of Redlands.
Mak has an MS (1978) and a Ph. D. (1980) in Economics from West Virginia University. Before join-
ing Redlands in 2001, he was tenured associate professor with Penn State, Smeal College of Business.
Prior to joining Penn State, he worked as a management consultant for five years. His consulting work
is mostly in economic and business planning as well as energy and macro-economic modeling. Mak’s
current research interests are focused on forecasting business, financial, and economic conditions using
statistical and artificial intelligence modeling techniques. His work is published in many academic jour-
nals such as the Journal of Forecasting, Journal of Real Estate Literature, Journal of Applied Statistics,
Computational Economics, Journal of Economic Dynamics and Control, Computers and Operations
Research, and Journal of Geographical Systems.
Masanori Goka received a PhD degree in engineering from Kobe University in 2007. He is currently
a researcher in the Hyogo Prefectural Institute of Technology, Japan. His research includes multiagent
systems, emergence systems and embodied cognition.
Masao Kubo was graduated from precision engineering department, Hokkaido University, in 1991.
He received his Ph.D. degree in computer Science from the Hokkaido University in 1996. He had been
the research assistant of chaotic engineering Lab, Hokkaido university (1996-1999). He was the lecturer
of Robotics lab, dep. of computer science, National Defense Academy, Japan. He was the visiting research
fellow of Intelligent Autonomous Lab, university of the west of England (2003-2005). Now, he is the as-
sociate professor of Information system lab, dep. of computer science, National Defense Academy, Japan.
Munehiro Takimoto is an assistant professor in the Department of Information Sciences from Tokyo
University of Science, Japan. His research interests include design and implementation of programming
languages. He received his BS, MS and PhD in Engineering from Keio University.
Azzam ul Asar completed his BSc in Electrical Engineering and MSc in Electrical Power Engi-
neering from NWFP UET Peshawar in 1979 and 1987 respectively. He completed his PhD in Artificial
Neural Network from University of Strathclyde, Glasgow, UK in 1994 followed by post doctorate in
Intelligent Systems in 2005 from New Jersey Institute of Technology, Newark, USA. He also served as
a visiting Professor at New Jersey Institute of Technology, USA from June 2004 to June 2005. Currently
he is acting as the chair of IEEE Peshawar Subsection and IEEE Power Engineering joint chapter. He
is Dean faculty of Engineering and Technology Peshawar NWFP since November 2008.
R. Suzuki received his Ph.D. degree from Nagoya University in 2003. He is now an associate pro-
fessor in the graduate school of information science, Nagoya University. His main research fields are
artificial life and evolutionary computation. Especially he is investigating how evolutionary processes
can be affected by ecological factors such as lifetime learning (phenotypic plasticity), niche construc-
tion, and network structures of interactions.
Ren-Jie Zeng is an assistant research fellow at Taiwan Institute of Economic Research starting from
2008 to the present. His current research is the macroeconomic and industrial studies of the Chinese
Economy. He holds an M.A. degree in Economics from National Chengchi University in Taiwan.
345
Robert G. Reynolds received his Ph.D. degree in Computer Science, specializing in Artificial Intel-
ligence, in 1979 from the University of Michigan, Ann Arbor. He is currently a professor of Computer
Science and director of the Artificial Intelligence Laboratory at Wayne State University. He is an Adjunct
Associate Research Scientist with the Museum of Anthropology at the University of Michigan-Ann
Arbor. He is also affiliated with the Complex Systems Group at the University of Michigan-Ann Arbor
and is a participant in the UM-WSU IGERT program on Incentive-Based Design. His interests are in
the development of computational models of cultural evolution for use in the simulation of complex
organizations and in computer gaming applications. Dr. Reynolds produced a framework, Cultural
Algorithms, in which to express and computationally test various theories of social evolution using
multi-agent simulation models. He has applied these techniques to problems concerning the origins of
the state in the Valley of Oaxaca, Mexico, the emergence of prehistoric urban centers, the origins of
language and culture, and the disappearance of the Ancient Anazazi in Southwestern Colorado using
game programming techniques. He has co-authored three books; Flocks of the Wamani (1989, Academic
Press), with Joyce Marcus and Kent V. Flannery; The Acquisition of Software Engineering Knowledge
(2003, Academic Press), with George Cowan; and Excavations at San Jose Mogote 1: The Household
Archaeology with Kent Flannery and Joyce Marcus (2005, Museum of Anthropology-University of
Michigan Press). Dr. Reynolds has received funding from both government and industry to support his
work. He has published over 250 papers on the evolution of social intelligence in journals, book chapters,
and conference proceedings. He is currently an associate editor for the IEEE Transactions on Compu-
tational Intelligence in Games, IEEE Transactions on Evolutionary Computation, International Journal
of Swarm Intelligence Research, International Journal of Artificial Intelligence Tools, International
Journal of Computational and Mathematical Organization Theory, International Journal of Software
Engineering and Knowledge Engineering, and the Journal of Semantic Computing.
Saba Mahmood is a PhD student at School of Electrical Engineering and Computer Science, NUST
Islamabad Pakistan. Her MS research area was Reputation Systems for Open Multiagent Systems. Her
PhD research is about Formalism of Trust in Dynamic Architectures. She served as the lecturer in the
department of computer science at the Institute of Management Sciences Peshawar from 2004-2007.
She is an active member of IEEE and remained academic chair of the IEEE Peshawar subsection.
Sachiyo Arai received the B.S degree in electrical engineering, the M.S. degree in control engineer-
ing and cognitive science, and Ph.D degree in artificial intelligence, from Tokyo Institute of Technology
in 1998. She worked at Sony Corporation for 2 years after receiving the B.S degree. After receiving
the Ph.D degree, she spent a year as a research associate in Tokyo Institute of Technology, and worked
as a Postdoctoral Fellows at the Robotics Institute in Carnegie Mellon University 1999-2001, a visiting
Associate Professor at the department of Social Informatics in Kyoto University 2001-2003. Currently,
an Associate Professor of Urban Environment Systems, Faculty of Engineering, Chiba University.
Shu G. Wang is an associate professor in the Department of Economics and also serves as the As-
sociate Director of the AI-ECON Research Center, National Chengchi University. Dr. Wang holds a
Ph. D. in Economics from Purdue University. His research interests are mainly on microeconomics,
institutional economics, law and economics and recently in agent-based computational economics and
experimental economics.
346
T. Arita received his B.S. and Ph.D. degrees from the University of Tokyo in 1983 and 1988. He is
now a professor in the graduate school of information science at Nagoya University. His research interest
is in artificial life, in particular in the following areas: evolution of language, evolution of cooperation,
interaction between evolution and learning, and swarm intelligence.
T. Logenthiran obtained his B.Eng. degree in the department of Electrical and Electronic Engineer-
ing, University of Peradeniya, Sri Lanka. He is currently pursuing Ph.D. degree in the department of
Electrical and Computer Engineering, National University of Singapore. His main areas of interest are
distributed power system and, application of intelligent multi-agent systems and computational intel-
ligence techniques to power engineering optimization.
Takao Terano is a professor at Department of Computational Intelligence and Systems Science,

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology. He re-
ceived BA degree in Mathematical Engineering in 1976 and M. A. degree in Information Engineering
in 1978 both from University of Tokyo, and Doctor of Engineering Degree in 1991 from Tokyo Institute
of Technology. His research interests include Agent-based Modeling, Knowledge Systems, Evolution-
ary Computation, and Service Science. He is a member of the editorial board of major Artificial Intel-
ligence- and System science- related academic societies in Japan and a member of IEEE, AAAI, and
ACM. He is also the president of PAAA.
Tina Yu received the M.Sc. Degree in Computer Science from Northeastern University, Boston, MA
in 1989. Between 1990 and 1995, she was a Member of Technical Staff at Bell-Atlantic (NYNEX) Sci-
ence and Technology, White Plains, NY. She went to University College London in 1996 and completed
her PhD in 1999. Between August 1999 and September 2005, she was with Math Modeling team at
Chevron Information Technology Company, San Ramon, CA. She joined the Department of Computer
Science, Memorial University of Newfoundland in October of 2005.
Tzai-Der Wang is a researcher who majors in Artificial Life, Evolutionary Computation, Genetic
Algorithms, Artificial Immune Systems, and Estimation of Distribution Algorithms. He was the program
chair in CIEF2007, a program co-chair in SEAL2006. He worked in AIECON, National Chengchi Uni-
versity, Taiwan ROC, as a post-doctor researcher in 2008 and is an assistant professor in department of
industrial engineering and management, Cheng Shiu University, Taiwan ROC currently. He also holds
international research relationship with Applied Computational Intelligence Research Unit of University
of the West of Scotland, Scotland and works with other researchers together. He was supervised by
Professor Colin Fyfe and graduated from University of Paisley in 2002 with the PhD in the evolution
of cooperations in artificial communities.
Xiangdong Che received his Bachelosr degree in Electrical Engineering and an M.S. in Computer
Science from Zhejiang University, China. He received his PhD in Computer Science in Wayne State
University in 2009. He has been working as a computer engineer for 16 years. He is currently working
for Wayne State University Computing and Information Technology Division. His research interests
primarily focus on Cultural Algorithms, Socially Motivated Learning, Evolutionary Computation,
Complex Systems, optimization, intelligent agents, and multi-agent simulation systems.
347
Y. Iwase received a B.S. degree from Toyama University in 2005 and a M.S. degree from Nagoya
University in 2007. Now, he is a Ph.D. student in the Graduate School of Information Science at Nagoya
University. His research interests include cellular automata, evolutionary computation and artificial life.
Especially, he investigates cellular automata interacting with the external environment.
Yasuhiro Tsujimura is an associate professor in the Department of Computer and Information

Engineering from the Nippon Institute of Technology, Japan. His research interests include evolution-
ary computations and their applications for operations research, reliability engineering and economics.
Recently, he is interested in swarm intelligence, artificial life. He received his B.E., M.E. and Dr. Eng.
in system safety engineering from Kogakuin University, Japan.
Yousof Gawasmeh is an Ph.D. student in computer science at the Wayne State University. He
received his M.S. degree in computer science from New York Institute of Technology. He also holds
B.S. degrees from the Yarmouk University. He is currently a Gradute Teacher Assistant in computer
science at the Wayne State University. He was working as a lecturer in Phildelphia university-Jordan
in Software Engineering Department. Yousof is interested in artificial intelligent systems that have to
operate in games domains. Most of his research centers around techniques for learning and planning
of teams of agents to act intelligently in their environments. He is concentrating on implementing the
Cultural Algorithm in multi-agent syatems games to organise the agents and direct them toward the
optimal solution. One of his applications is the landbridge game.
Yukiko Orito is a lecturer in the Graduate School of Social Sciences (Economics) at the Hiroshima
University. Her research interests are the analysis of combinatorial optimization problems in financial
research. She received Dr. Eng., MS and BA in Production and Information Engineering from the Tokyo
Metropolitan Institute of Technology.
Kazuteru Miyazaki is an Associate professor at the Department of Assessment and Research for
Degree Awarding, National Institution for Academic Degrees and University Evaluation. His other
accomplishments include:1996- Assistant Professor, Tokyo Institute of Technology, 1998- Research
Associate, Tokyo Institute of Technology, 1999- Associate Professor, National Institution for Academic
Degrees and 2000- Associate Professor, National Institution for Academic Degrees and University
Evaluation. Miyazaki’s main works include : A Reinforcement Learning System for Penalty Avoiding
in Continuous State Spaces, Journal of Advanced Computational Intelligence and Intelligent Informat-
ics, Vol.11, No.6, pp.668-676, 2007 with S. Kobayashi and Development of a reinforcement learning
system to play Othello, Artificial Life and Robotics, Vol.7, No.4, pp.177-181, 2004 with S. Tsuboi, and S.
Kobayashi. Miyazaki is also a member of: The Japanese Society for Artificial Intelligence (JSAI), The
Society of Instrument and Control Engineers (SICE), Information Processing Society of Japan (IPSJ),
The Japan Society of Mechanical Engineers (JSME) The Robot Society of Japan (RSJ), and Japanese
Association of Higher Education Research.
348
349
Index
A asymmetric information 120

atoms 59
agent-based artificial stock markets 118 attractive pheromone 301, 302, 303
agent-based computational economics (ACE) automatic defined functions (ADF) 102
35, 36, 41, 43 autonomous mobile robots 156, 157, 158, 172
agent-based models 19, 20, 21, 22, 25, 26, 27, autonomous underwater vehicles (AUVs) 197
30, 31, 32, 93, 133, 134, 135
agent-based uncertainty theory (AUT) 50, 51, B
52, 58, 65, 66, 67, 69, 70, 71, 73, 74
agent engineering 35, 36 backward digit span test (BDG) 104, 110
agent identifiers (AID) 214 Bayesian Game Against Nature (BGAN) 101,
agent property 33 102
agents 134, 135, 138, 141, 143, 146, 147 bees 296
agents space 54 bid/ask (BA) 80, 81, 84, 85
Agile Program Design methodologies 197 Boolean algebra 60, 62
Agile software design 194 bounded rationality 79, 80, 81, 93
Alpena-Amberley Land Bridge 197 brainstem 37
Alpena-Amberly Ridge 194 buy/sell (BS) 80
amount of reward value 235 C
ant colony clustering (ACC) 175, 176, 177,
179, 181, 182, 183, 184, 186, 187, 188, California Association of Realtors (C.A.R.) 3,
189, 190 16
ant colony optimization (ACO) 174, 175, 177, call for proposal (CFP) 223, 224
181, 191, 251, 255, 297 capital asset pricing model (CAPM) 134, 135
ANT Net 257 cash 118, 125, 130, 131
ants 295, 296, 297, 301, 304, 305, 307 cash out loans (COL) 1, 7, 8, 9, 11
ant war 297, 298, 300, 305, 306 CASK 85, 89, 90, 91, 92
apprenticeship learning 268, 283 Cellular Automata (CAs) 309, 310, 311, 312,
artificial evolution 156, 157, 159, 171, 172 313, 314, 315, 316, 317, 319, 320
artificial intelligence in economics-double auc- cerebral cortex 37
tion (AIE-DA) 100, 101, 115 change in housing inventory (CHI) 1, 7, 11
artificial markets 118, 133 chemical substances 295, 296
artificial neural networks (ANN) 1, 4, 5, 6, 7, Chicago Mercantile Exchange (CME) 2, 16, 80
8, 9, 11, 12, 13, 15, 18, 156, 157 chromosomes 300, 301
artificial stigmergy 174, 175 clustering algorithm 191
asset pricing 134, 142, 151 clustering points 176
Index
co-evolution 78, 81, 82, 86, 90, 91, 92, 94 evolving autonomous robots 157
cognitive capacity 96, 97, 99, 100, 104, 105, evolving CTRNN (eCTRNN) 157, 158, 159,
108, 111, 112, 113, 116 160, 161, 162, 163, 164, 165, 166, 171
collective behaviour 156, 159, 166, 172 experience manager 253, 254
communication networks 174, 175, 188 Exploitation-oriented Learning (XoL) 267,
component object model (COM) 225, 226 268, 277, 282, 283
computer simulations 134
conflicting agents 50, 55, 76 F
conflict structure 269, 286, 287, 289 financial economics research 134
constant absolute risk aversion (CARA) 40 financial market analyses 134
constant relative risk aversion (CRRA) 40 financial markets 118, 134, 135, 136, 137, 138,
continuing task 232, 233, 235, 238, 243, 244, 139, 141, 143, 146, 149, 151, 153, 154
245 financial markets, efficiency of 134, 141
continuous time recurrent neural networks FIRE 249, 251, 255, 257, 260, 262, 263, 264
(CTRNN) 156, 157, 172 first order conflict (FOC) 54
coordination between agents 220 follower agents 19, 20, 22, 23, 24, 25, 26, 27,
corporate finance 134 30, 33
credit assignment 235 foraging agents 194, 201
Cultural Algorithms 195, 201, 202, 207 Friedman, Milton 118, 119
CurrentTime 237 functional magnetic resonance imaging (fMRI)
38, 40, 45
D
fusion 50, 51, 54, 67, 68, 70, 71, 76
decentralized pervasive systems 309, 310, 311, fuzzy logic 50, 51
320
decision making rules, micro-level 134, 135 G
designing the reward function 233 generalized feedforward networks (GFF) 6
disobedient follower 23, 27 genetic algorithms (GA) 20, 21, 30, 31, 100,
dissipative cellular automata 311 157, 312, 313, 314, 319
distributed energy resource 208, 230 genetic programming (GP) 1, 2, 4, 5, 6, 7, 8, 9,
distributed generation (DG) 210 11, 12, 13, 15, 16, 17, 18, 79, 81, 82, 83,
dopaminergic reward prediction error (DRPE) 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
41 95, 97, 99, 100, 101, 102, 103, 104, 105,
double auction (DA) 79, 80, 81, 82, 84, 92, 93 106, 107, 108, 109, 110, 111, 112, 113,
Dynamic Programming (DP) 267, 268 115, 116, 117
Gjerstad-Dickhaut (GD) 98, 101
E
global configurations 315, 317, 319
Early Archaic occupation 195 goal-agent set 273, 274, 289, 290
Easley-Ledyard (EL) 102 Godderich region 194
eBay 250 Grid environment 250
efficient market hypothesis (EMH) 119
emergent behaviors 311, 317, 319 H
evolutionary agent design 306 Hedonic psychology 38
evolutionary computation 300 heterogeneous agent models (HAM) 121
evolutionary robotics (ER) 156, 172 heterogeneous cell states 319
evolving artificial neural network (EANN) high-density pheromones 299
156, 157, 160, 171
350
Index
hominid brain 37 Logit model 119, 121, 122, 133

homogeneous cell states 319 Los Angeles MSA (LXXR) 1, 2, 3, 4, 5, 6, 7, 9,
Housing Predictor 3, 17 11, 12, 13, 14, 15
hunting agents 193, 194, 197, 201, 202, 203,
204, 205, 206 M
hyperbolic absolute risk aversion (HARA) 40 mammalian brain 37
Manets 258, 259, 260
I
Marine Hydrodynamics Laboratory 197
identical neural network 300 market clearing price (MCP) 212, 224, 226,
immediate physical environment 201 227
immediate social environment 201 market portfolios 135
independent follower 24, 31 Markov Decision Processes (MDPs) 267, 268,
independent system operator (ISO) 210, 211, 270
212, 213, 215, 218, 219, 223, 224, 226, MBEANN 157, 160, 164, 165, 166, 167, 168,
228 169, 170, 171, 172
information ratio 19, 20, 22, 23, 24, 25, 26, 27, mean absolute percent error (MAPE) 5, 15
28, 29, 30, 31, 33 mean square error (MSE) 4, 5, 6, 8, 9, 11, 12,
intelligent agent 201 13
intelligent robot control 191 memory-based approach 276, 277
intelligent robots 175 memory updating test (MU) 104, 110
interacting agent hypothesis (IAH) 119 metropolitan statistical areas (MSA) 2, 3
Inverse Reinforcement Learning (IRL) 268 minority games 118
investment strategy, active 134, 151 mobile agents 174, 175, 176, 177, 178, 179,
investment strategy, passive 134, 142, 146, 181, 184, 188
147, 149, 151 mobile robot 174, 175, 184, 188, 191
investor behaviors 134 mobile software agents 174, 175, 177, 188
investor overconfidence 134, 138, 139, 140, multiagent cooperative task 232, 233, 235
141, 142, 149 multi-agent systems (MAS) 1, 3, 4, 5, 36, 37,
investors, irrational 119 209, 210, 214, 221, 226, 230, 249, 250,
Ising model 119, 121, 133 251, 253, 262, 265, 266, 310
multiagent task 233, 235, 243, 245
J multilayer perceptrons (MLP) 6
java agent development (JADE) 214, 216, 220, multi-robots 174, 175, 176, 181, 184, 188, 191
226, 228 multi-robot system (MRS) 156, 157, 172, 174,
175, 177, 184, 189
K multi-start method 267, 268, 271
Museum of Anthropology 197, 198
Khepera robots 157
mutual exclusive (ME) 53, 58
L N
Lake Huron 193, 194, 195, 196, 197
nasanov pheromones 296
LastActionTime 237, 244
neural networks 50, 51, 69, 70, 71, 76, 156,
leader agents 19, 20, 22, 23, 24, 25, 26, 33
166, 170, 171
limbic system 37
neuroeconomics 35, 36, 38, 40, 41, 43, 44, 45,
local optimum 277
46, 47, 48
Logic of Uncertainty 75
New York Stock Exchange (NYSE) 80
351
Index
noble method 156 queen pheromones 296

NofT(learning) 280 query theory 39, 46
NofT(test) 280
noise traders 119, 132, 133 R
normal distribution table 280 radio frequency identification (RFID) 175,
176, 178, 179, 184, 185, 186, 189
O
random walk (RA) 268, 277, 278, 279, 280,
obedient follower 23, 27, 31 281, 282, 283, 285, 291, 292, 293
online electronic communities 250 rationality theorem of Profit Sharing (PS) 267,
Ontario 194, 196, 197, 198 268, 269, 270, 271, 272, 273, 274, 276,
open systems 249 277, 278, 279, 280, 281, 282, 283, 284,
operation span test (OS) 104, 110 288, 291, 292, 293
optimal policy 267, 269, 272, 273, 275, 276 Rational Policy Making algorithm 268
optimal power flow (OPF) 224, 225, 226 rational traders 133
ordinary least squares (OLS) 4, 5, 9, 11, 12, recommendation manager 253, 254
13, 15, 16 regression model (RM) 4, 9, 11
REGRET 249
P reinforcement function 269, 274, 286
Paleo-Indian 193, 194, 195, 197, 206 reinforcement interval 269, 271, 274, 287, 289
Paleo-Indian occupation 194 reinforcement learning (RL) 267, 268, 270,
Partially Observable (PO) 269, 276, 277, 279, 277, 279, 281, 282
280 remote operated vehicles (ROVs) 197
Partially Observed Markov Decision Processes reptilian brain 37
(POMDPs) 268, 269, 270, 271, 274, repulsive pheromone 301
276, 277, 279, 282, 283, 284, 286, 288, reputation manager 253
291 reputation model 248, 251, 253, 256, 257, 258,
Particle Swarm Optimization (PSO) 251, 252 261, 262, 264
periodic boundary condition 312 reward prediction error (RPE) 41, 42, 48
perturbation 309, 311, 312, 313, 316, 317, 319 robot controller 156, 158, 160, 166
pheromone 295, 296, 297, 298, 299, 300, 301, robots 174, 175, 176, 177, 178, 179, 180, 181,
302, 303, 304, 305, 306, 307 184, 185, 186, 188, 189, 190, 191
pheromone trails 252 rostral cingular zone (RCZ) 42
PMaxAsk 85, 89, 90, 91, 92 roulette selection 278, 280
PMin 85, 89, 90 rule-based system 208, 230
pole balancing task 233
S
policy invariance 268
PoolCo market model 208, 212, 213, 230 Santa Fe Token Exchange (SFTE) 80
portfolio optimization 19, 20, 21, 22, 27, 31 scaled density distribution 313, 318, 319
power exchange (PX) 210, 211, 212, 213, 215, schedule coordinator (SC) 215, 223, 224
219, 223, 224, 228 search space splitting 20, 22, 26, 31
price fluctuations 134 self-organizing behaviors 309, 311, 312, 315,
pursuit game 235 319, 320
semi-Markov decision process (SMDP) 234,
Q 235, 236
quantum computer 59, 60, 61, 62, 65, 66, 67, sensory input 268, 269, 270, 273, 274, 275,
73, 74 276, 277, 278, 279, 280, 281, 282, 286,
290
352
Index
sentence span test (SS) 104, 110 topology and weight evolving artificial neural
sharpe ratio 20 networks (TWEANN) 156, 157, 160
simulation agent 175, 176, 178, 179, 181 traders 118, 119, 120, 121, 124, 126, 127, 128,
Smith. Adam 79, 92 129, 130, 131, 132
social behaviors, macro-level 134, 135 traders, chartists 118, 121, 127
social insects 295, 296 traders, fundamentalists 118, 120, 121, 126
social intelligence 194, 195, 201, 202 traders, imitators 118, 119, 126, 127, 128, 129,
spatial short-term memory test (SSTM) 104, 130, 131
110 traders, rational 118, 119, 126, 127, 128, 129,
SPORAS 250 130
stability of transitions 317 trading strategies 118, 119, 120, 123, 130, 131
Standard & Poor's Case-Shiller Indexes (S&P TRAVOS 250, 251
CSI) 2, 3
stigmergy 251, 252, 296, 297, 305, 306 U
Stochastic Gradient Ascent (SGA) 277, 278, useful rational policy 268
281, 282, 283, 285 Utility Gain (UG) 251, 257, 258, 262, 263
stochastic policy 277, 279, 281, 282
stocks 118, 130, 131 V
suppression conditions 271, 287, 288, 289
Virtual Organization 251
Sussex approach 157
swarm behavior 295 W
swarm intelligence 191
swarm intelligence (SI) 251, 252 Wayne State University 193, 194
T X
time of reward assignment 235 Xtreme Programming 199
time series genetic programming (TSGP) 5, 6,
17
Z
TOPIX 26 zero-intelligence constrained (ZIC) 101, 103
zero-intelligence plus (ZIP) 98, 101
353

Multi Agent Applications

Uploaded by

Copyright:

Available Formats

Multi Agent Applications

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multi Agent Applications

Uploaded by

Copyright:

Available Formats

Multi-Agent Applications

Medical Information science reference

Published in the United States of America by

Library of Congress Cataloging-in-Publication Data

British Cataloguing in Publication Data

Preface . ............................................................................................................................................... xvi

Compilation of References................................................................................................................ 321

About the Contributors..................................................................................................................... 341

Preface . ............................................................................................................................................... xvi

Compilation of References................................................................................................................ 321

About the Contributors..................................................................................................................... 341

2. MULTI-AGENT FINANCIAL SYSTEMS

4 BIO-INSPIRED AGENT-BASED ARTIFICIAL MARKETS

6 MULTI-AGENT GAMES AND SIMULATION

Smith, V. (1982). Markets as economizers of information: Experimental examination of the “Hayek

INTRODUCTION ing related to housing sales have kept the U.S.

METhODOLOGY 2008) is used. Typically, a small sample of the

Neural Networks Determining variables to employ in explaining the

Superiority is determined according to training COLt = f(MRt-6, …, MRt-18); (1)

Table 1. Estimation statistics of fitted explanatory variables

Variable MSE R2 Agent

Table 2. %D_LXXR forecasts obtained by independent, complementing, and reconciliatory agents

Independent Complementing Reconciliatory

where GPNN = estimated values obtained from (0.006) (0.053) (0.052)

Table 3. LXXR forecasts obtained by independent, complementing, and reconciliatory agents

Independent Complementing Reconciliatory

Efficacy of the forecast produced can be CONCLUSION

INTRODUCTION determines the appropriate weights of assets

Figure 1. Re-composition of the populations

PG : the sequence of the return rates of port-

folio Gk over t = 1,…, T. That is the vector

Figure 2. Agent properties Figure 3. Behavior of the obedient agent

population. For the unit of investment of Asset i

Figure 4. [Behavior of the disobedient agent]

For example, Figure 4 shows an evolutional Behavior of the Leader Agent

1. We randomly select n assets from N assets

Figure 5. Behavior of the leader agent

    Processing of the Population

Figure 6. Progression populations during search

1. When the new leader appears in the popula-

After repeating these processes, we select the Information Ratio as a Function of

Table 1. Parameters in Agent-based Model

The total number of all the assets included in a portfolio: N 200

The total number of agents in the solution space 100

The number of iterations of the evolutionary process: K 100,200,300,400,500

The number of investment units moved between Groups

periods as a function of the repetition of the pro- {a abd

Figure 9. Information Ratio as a function of the ratio {alpha_obd,alpha_dob,alpha_idp}

Figure 10. Information Ratio as a function of the ratio {beta_same,beta_diff}

Figure 11. Genetic representation

Table 2. Information Ratio average

Period No. Agent-based Model GA

Elton, E., Gruber, G., & Blake, C. (1996). Survivor-

NEUROECONOMICS: AN ACE entities and work with each of them separately;

neural cause) of the cognitive biases, it is more PREFERENCE

The brain is composed of three major parts: the Preference Construction

Neural Representation of Hyperbolic Neural Representation of Risk

Different assumptions of risk preference, such as

INTRODUCTION uses complex fusion of crisp conflicting judg-