Ai Unit Iii
Ai Unit Iii
FOUNDATION OF
ARTIFICIAL INTELLIGENCE
NOTES
UNIT III
SYLLABUS
1
3.1 DEFINITION
Agent architectures, like software architectures, are formally a description of the elements from which a system
is built and the way they communicate. Further, these elements can be defined from patterns with specific
constraints. [Shaw/Garlin 1996]
1. Several common architectures exist that go by the names pipe-and filter or layered architecture.
2. these define the interconnections between components.
3. Pipe-and-Filter defines a model where data is moved through a set of one or more objects that perform a
transformation.
4. Layered simply means that the system is comprised of a set of layers that provide a specific set of logical
functionalities, and that connectivity is commonly restricted to the layers contiguous to one another.
3
Figure 3.3 The blackboard architecture supports multi-agent problem solving.
4
4. The mobile agent framework provides a protocol that permits communication between hosts for agent
migration.
5. This framework also requires some kind of authentication and security, to avoid a mobile agent framework
from becoming a conduit for viruses. Also implicit in the mobile agent framework is a means for discovery.
6. For example, which hosts are available for migration, and what services do they provide? Communication is
also implicit, as agents can communicate with one another on a host, or across hosts in preparation for
migration.
7. The mobile agent architecture is advantageous as it supports the development of intelligent distributed
systems. But a distributed system that is dynamic, and whose configuration and loading is defined by the agents
themselves.
5
10. Subsumption is also reactive in nature, meaning that in the end, the architecture still simply maps inputs
to behaviours (no planning occurs, for example). What subsumption does provide is a means to choose which
behaviour for a given environment.
6
10. ATLANTIS (Deliberative Architecture)
1. The goal of ATLANTIS (A Three-Layer Architecture for Navigating Through Intricate Situations), was to create
a robot that could navigate through dynamic and imperfect environments in pursuit of explicitly stated high-
level goals.
2. ATLANTIS was to prove that a goal-oriented robot could be built from a hybrid architecture of lower-level
reactive behaviours and higher- level deliberative behaviours.
7
4. The architecture also includes a plan executor (or interpreter), which is used to execute the plan at the
actuators. The architecture also included a variety of monitor processes. The basic idea behind Homer was an
architecture for general intelligence.
5. The keyboard would allow regular English language input, and a terminal would display generated English
language sentences. The user could therefore communicate with Homer to specify goals and receive feedback
via the terminal.
6. Homer could log perceptions of the world, with timestamps, to allow dialogue with the user and rational
answers to questions. Reflective (monitor) processes allow Homer to add or remove knowledge from the
episodic memory.
7. Homer is an interesting architecture implementing several interesting ideas, from natural language
processing to planning and reasoning. One issue found in Homer is that when the episodic memory grows large,
it tends to slow down the overall operation of the agent.
8
7. PRS is a useful architecture when all necessary operations can be predefined. It’s also very efficient due to
lack of plan generation. This makes PRS an ideal agent architecture for building agents such as those to control
mobile robots.
13. AGLETS (MOBILE)
1. Aglets is a mobile agent framework designed by IBM Tokyo in the 1990s. Aglets is based on the Java
programming language, as it is well suited for a mobile agent’s framework. First, the applications are portable
to any system (both homogeneous and heterogeneous) that can run a Java Virtual Machine (JVM). Second, a
JVM is an ideal platform for migration services.
2. Java supports serialization, which is the aggregation of a Java application’s program and data into a single
object that is restart able.
3. In this case, the Java application is restarted on a new JVM. Java also provides a secure environment
(sandbox) to ensure that a mobile agent framework doesn’t become a virus distribution system. The Aglets
framework is shown in Figure 3.9. At the bottom of the framework is the JVM (the virtual machine that
interprets the Java byte codes). The agent runtime environment and mobility protocol are next. The mobility
protocol, called Aglet Transport Protocol (or ATP), provides the means to serialize agents and then transport
them to a host previously defined by the agent.
4. The agent API is at the top of the stack, which in usual Java fashion, provides several API classes that focus
on agent operation. Finally, there are the various agents that operate on the framework.
5. The agent API and runtime environment provide several services that are central to a mobile agent
framework. Some of the more important functions are agent management, communication, and security.
Agents must be able to register themselves on a given host to enable communication from outside agents.
6. In order to support communication, security features must be implemented to ensure that the agent has
the authority to execute on the framework.
7. Aglets provides several necessary characteristics for a mobile agent framework, including mobility,
communication, security, and confidentiality. Aglets provide weak migration, in that the agents can only
migrate at arbitrary points within the code (such as with the dispatch method).
14. MESSENGERS (MOBILE)
1. Messengers is a runtime environment that provides a form of process migration (mobile agency).
2. One distinct strength of the messenger’s environment is that it supports strong migration, or the ability to
migrate at arbitrary points within the mobile application.
3. The messengers environment provides the hop statement which defines when and where to migrate to a
new destination.
4. After migration is complete, the messenger’s agent restarts in the application at the point after the previous
hop statement. The result is that the application moves to the data, rather than using a messaging protocol to
move the data to the agent.
5. There are obvious advantages to this when the data set is large, and the migration links are slow. The
messengers model provides what the authors call Navigational Programming and Distributed Sequential
Computing (DSC).
6. What makes these concepts interesting is that they support the common model of programming that is
identical to the traditional flow of sequential programs. This makes them easier to develop and understand.
7. Let’s now look at an example of DSC using the messenger’s environment. Listing 11.5 provides a simple
program. Consider an application where on a series of hosts, we manipulate large matrices which are held in
their memory.
15. SOAR (HYBRID)
1. Soar, which originally was an acronym for State-Operator-And-Result, is a symbolic cognitive architecture.
2. Soar provides a model of cognition along with an implementation of that model for building general-purpose
AI systems.
3. The idea behind Soar is from Newell’s unified theories of cognition. Soar is one of the most widely used
architectures, from research into aspects of human behaviour to the design of game agents for first person-
shooter games.
4. The goal of the Soar architecture is to build systems that embody general intelligence. While Soar includes
many elements that support this goal (for example, representing knowledge using procedural, episodic, and
declarative forms), but Soar lacks some important aspects. These include episodic memories and also a model
for emotion. Soar’s underlying problem-solving mechanism is based on a production system (expert system).
9
5. Behaviour is encoded in rules like the if-then form. Solving problems in Soar can be most simply described
as problem space search (to a goal node). If this model of problem solving fails, other methods are used, such
as hill climbing.
6. When a solution is found, Soar uses a method called chunking to learn a new rule based on this discovery. If
the agent encounters the problem again, it can use the rule to select the action to take instead of performing
problem solving again.
10
8. Identity When a communication occurs among agents, its meaning is dependent on the identities and roles
of the agents involved, and on how the involved agents are specified.
A message might be sent to a particular agent, or to just any agent satisfying a specified criterion.
9. Cardinality A message sent privately to one agent would be understood differently than the same message
broadcast publicly.
3.4 NEGOTIATION
1. A frequent form of interaction that occurs among agents with different goals is termed negotiation.
2. Negotiation is a process by which a joint decision is reached by two or more agents, each trying to reach an
individual goal or objective. The agents first communicate their positions, which might conflict, and then try to
move towards agreement by making concessions or searching for alternatives.
3. The major features of negotiation are (1) the language used by the participating agents, (2) the protocol
followed by the agents as they negotiate, and (3) the decision process that each agent uses to determine its
positions, concessions, and criteria for agreement.
4. Many groups have developed systems and techniques for negotiation. These can be either environment-
centered or agent-centered. Developers of environment-centered techniques focus on the following problem:
"How can the rules of the environment be designed so that the agents in it, regardless of their origin,
capabilities, or intentions, will interact productively and fairly?"
The resultant negotiation mechanism should ideally have the following attributes:
• Efficiency: the agents should not waste resources in coming to an agreement.
• Stability: no agent should have an incentive to deviate from agreed- upon strategies.
• Simplicity: the negotiation mechanism should impose low computational, and bandwidth demands on
the agents.
• Distribution: the mechanism should not require a central decision maker.
• Symmetry: the mechanism should not be biased against any agent for arbitrary or inappropriate
reasons.
5. An articulate and entertaining treatment of these concepts is found in [36]. In particular, three types of
environments have been identified: worth-oriented domains, state-oriented domains, and task-oriented
domains.
6. A task-oriented domain is one where agents have a set of tasks to achieve, all resources needed to achieve
the tasks are available, and the agents can achieve the tasks without help or interference from each other.
However, the agents can benefit by sharing some of the tasks. An example is the "Internet downloading
domain," where each agent is given a list of documents that it must access over the Internet. There is a cost
associated with downloading, which each agent would like to minimize. If a document is common to several
agents, then they can save downloading cost by accessing the document once and then sharing it.
7. The environment might provide the following simple negotiation mechanism and
constraints:
(1) each agent declares the documents it wants
(2) documents found to be common to two or more agents are assigned to agents based on the toss of a coin,
(3) agents pay for the documents they download, and
(4) agents are granted access to the documents they download. as well as any in their common sets. This
mechanism is simple, symmetric, distributed, and efficient (no document is downloaded twice). To determine
stability, the agents' strategies must be considered.
8. An optimal strategy is for an agent to declare the true set of documents that it needs, regardless of what
strategy the other agents adopt or the documents they need. Because there is no incentive for an agent to
diverge from this strategy, it is stable.
9. For the first approach, speech-act classifiers together with a possible world semantics are used to formalize
negotiation protocols and their components. This clarifies the conditions of satisfaction for different kinds of
messages. To provide a flavor of this approach, we show in the following example how the commitments that
an agent might make as part of a negotiation are formalized [21]:
11
10. This rule states that an agent forms and maintains its commitment to achieve ø individually iff (1) it has not
precommitted itself to another agent to adopt and achieve ø, (2) it has a goal to achieve ø individually, and (3)
it is willing to achieve ø individually. The chapter on "Formal Methods in DAI" provides more information on
such descriptions.
11. The second approach assumes that the agents are economically rational. Further, the set of agents must
be small, they must have a common language and common problem abstraction, and they must reach a
common solution. Under these assumptions, Rosenschein and Zlotkin [37] developed a unified negotiation
protocol.
Agents that follow this protocol create a deal, that is, a joint plan between the agents that would satisfy all of
their goals. The utility of a deal for an agent is the amount he is willing to pay minus the cost of the deal. Each
agent wants to maximize its own utility.
The agents discuss a negotiation set, which is the set of all deals that have a positive utility for
every agent.
In formal terms, a task-oriented domain under this approach becomes a tuple <T, A, c> where T is the set of
tasks, A is the set of agents, and c(X) is a monotonic function for the cost of executing the tasks X. A deal is a
redistribution of tasks. The utility of deal d for agent k is Uk(d) = c(Tk) - c(dk)
The conflict deal D occurs when the agents cannot reach a deal. A deal d is individually rational
if d > D. Deal d is pareto optimal if there is no deal d' > d. The set of all deals that are individually rational and
pareto optimal is the negotiation set, NS. There are three possible situations:
1. conflict: the negotiation set is empty
2. compromise: agents prefer to be alone, but since they are not, they will agree to a negotiated deal
3. cooperative: all deals in the negotiation set are preferred by both agents overachieving their goals alone.
When there is a conflict, then the agents will not benefit by negotiating—they are better off acting alone.
Alternatively, they can "flip a coin" to decide which agent gets to satisfy its goals.
Negotiation is the best alternative in the other two cases.
Since the agents have some execution autonomy, they can in principle deceive or mislead each other.
Therefore, an interesting research problem is to develop protocols or societies in which the effects of deception
and misinformation can be constrained. Another aspect of the research problem is to develop protocols under
which it is rational for agents to be honest with each other. The connections of the economic approaches with
human-oriented negotiation and argumentation have not yet been fully worked out.
3.5 BARGAINING
Link: http://www.cse.iitd.ernet.in/~rahul/cs905/lecture15/index.html (Refer this link for easy understanding) A
bargaining problem is defined as a pair (S,d). A bargaining solution is a function f that maps every bargaining
problem (S,d) to an outcome in S, i.e., f : (S,d) → S
Thus the solution to a bargaining problem is a pair in R2. It gives the values of the game to the two players and
is generated through the function called bargaining function.
Bargaining function maps the set of possible outcomes to the set of acceptable ones.
Bargaining Solution
• In a transaction when the seller and the buyer value a product differently, a surplus is created.
• A bargaining solution is then a way in which buyers and sellers agree to divide the surplus.
• For example, consider a house made by a builder A. It costed him Rs.10 Lacs. A potential buyer is
interested in the house and values it at Rs.20 Lacs. This transaction can generate a surplus of Rs.10
Lacs. The builder and the buyer now need to trade at a price. The buyer knows that the cost is less than
20 Lacs and the seller knows that the value is greater than 10 Lacs. The two of them need to agree at
a price. Both try to maximize their surplus. Buyer would want to buy it for 10 Lacs, while the seller
would like to sell it for 20 Lacs. They bargain on the price, and either trade or dismiss.
• Trade would result in the generation of surplus, whereas no surplus is created in case of no-trade.
Bargaining Solution provides an acceptable way to divide the surplus among the two parties.
• Formally, a Bargaining Solution is defined as, F : (X,d) → S, where X R2 and S,d R2. X represents the
utilities of the players in the set of possible bargaining agreements. d represents the point of
disagreement. In the above example, price [10,20], bargaining set is simply x + y 10, x 0, y 0. A
point (x,y) in the bargaining set represents the case, when seller gets a surplus of x, and buyer gets a
surplus of y, i.e. seller sells the house at 10 + x and the buyer pays 20 - y.
12
1. the set of payoff allocations that are jointly feasible for the two players in the process of negotiation or
arbitration, and
2. the payoffs they would expect if negotiation or arbitration were to fail to reach a settlement.
Based on these assumptions, Nash generated a list of axioms that a reasonable solution ought to satisfy. These
axioms are as follows:
Axiom 1 (Individual Rationality) This axiom asserts that the bargaining solution should give neither player less
than what it would get from disagree ment, i.e., f(S,d) = d.
Axiom 2 (Symmetry) As per this axiom, the solution should be independent of the names of the players, i.e.,
who is named a and who is named b. This means that when the players’ utility functions and their disagreement
utilities are the same, they receive equal shares. So any symmetries in the final payoff should only be due to
the differences in their utility functions or their disagreement outcomes.
Axiom 3 (Strong Efficiency) This axiom asserts that the bargaining solution should be feasible and Pareto
optimal.
Axiom 4 (Invariance) According to this axiom, the solution should not change as a result of linear changes to
the utility of either player. So, for example, if a player’s utility function is multiplied by 2, this should not change
the solution. Only the player will value what it gets twice as much.
Axiom 5 (Independence of Irrelevant Alternatives) This axiom asserts that eliminating feasible alternatives
(other than the disagreement point) that would not have been chosen should not affect the solution, i.e., for
any closed convex set N ash proved that the bargaining solution that satisfies the above five axioms is given by
13
3.6 ARGUMENTATION
➢ “A verbal and social activity of reason aimed at increasing (or decreasing) the acceptability of a controversial
standpoint for the listener or reader, by putting forward a constellation of propositions (i.e. arguments)
intended to justify (or refute) the standpoint before a rational judge”
➢ Argumentation can be defined as an activity aimed at convincing of the acceptability of a standpoint by
putting forward propositions justifying or refuting the standpoint.
➢ Argument: Reasons / justifications supporting a conclusion
➢ Represented as: support ->conclusion
– Informational arguments: Beliefs -> Belief e.g. If it is cloudy, it might rain.
– Motivational args: Beliefs, Desires ->Desire e.g. If it is cloudy and you want to get out, then you don’t want
to get wet.
– Practical arguments: Belief, Sub-Goals -> Goal e.g. If it is cloudy and you own a raincoat, then put the raincoat.
– Social arguments: Social commitments-> Goal, Desire e.g. I will stop at the corner because the law says so.
e.g I can’t do that, I promise to my mother that won’t.
Process of Argumentation
1. Constructing arguments (in favor of / against a “statement”) from available information.
A: “Tweety is a bird, so it flies”
B: “Tweety is just a cartoon!”
2. Determining the different conflicts among the arguments.
“Since Tweety is a cartoon, it cannot fly!” (B attacks A)
Evaluating the acceptability of the different arguments
“Since we have no reason to believe otherwise, we’ll assume Tweety is a cartoon.”
(Accept B). “But then, this means despite being a bird he cannot fly.” (Reject A).
3. Concluding or defining the justified conclusions.
“We conclude that Tweety cannot fly!”
Computational Models of Argumentation:
1. Given the definition of arguments over a content language (and its logic), the models
allow to:
• Compute interactions between arguments: attacks, defeat, support, ...
• Valuation of arguments: assign weights to arguments to compare them.
Intrinsic value of an argument Interaction-based value of an argument
2. Selection of acceptable argument (conclusion)
• Individual acceptability
• Collective acceptability
14
3.7 TRUST & REPUTATION IN MULTI AGENT SYSTEMS--
It depends on the level we apply it:
1. User confidence
• Can we trust the user behind the agent?
– Is he/she a trustworthy source of some kind of knowledge? (e.g. an expert in a field)
– Does he/she act in the agent system (through his agents in a trustworthy way?
2. Trust of users in agents
• Issues of autonomy: the more autonomy, less trust
• How to create trust?
– Reliability testing for agents
– Formal methods for open MAS
– Security and verifiability
3. Trust of agents in agents
• Reputation mechanisms
• Contracts
• Norms and Social Structures
What is Trust?
1. In closed environments, cooperation among agents is included as part of the designing process.
2. The multi- agent system is usually built by a single developer or a single team of developers and the chosen
developers, option to reduce complexity is to ensure cooperation among the agents they build including it as
an important system requirement.
3. Benevolence assumption: an agent AI requesting information or a certain service from agent aj can be sure
that such agent will answer him if AI has the capabilities and the resources needed, otherwise aj will inform AI
that it cannot perform the action requested.
4. It can be said that in closed environments trust is implicit.
Trust can be computed as
1. A binary value (1= ‘I do trust this agent’, 0=‘I don’t trust this agent’)
2. A set of qualitative values or a discrete set of numerical values (e g ‘trust always’ ‘trust
conditional to X’ ‘no trust’) e.g. always, X, trust) (e.g. ‘2’, ‘1’, ‘0’, ‘-1’, ‘-2’)
3. A continuous numerical value (e.g. [-300.300])
4. A probability distribution
5. Degrees over underlying beliefs and intentions (cognitive approach)
HOW TO COMPUTE TRUST
1. Trust values can be externally defined
• by the system designer: the trust values are pre-defined
• by the human user: he can introduce his trust values about the humans behind the other
agents
2. Trust values can be inferred from some existing representation about the interrelations
between the agents
• Communication patterns, cooperation history logs, e-mails, webpage connectivity
mapping...
3. Trust values can be learnt from current and past experiences
• Increase trust value for agent AI if behaves properly with us
• Decrease trust value for agent AI if it fails/defects us
4. Trust values can be propagated or shared through a MAS
• Recommender systems, Reputation mechanisms.
TRUST AND REPUTATION
1. Most authors in literature make a mix between trust and reputation
2. Some authors make a distinction between them
3. Trust is an individual measure of confidence that a given agent has over other agent(s)
4. Reputation is a social measure of confidence that a group of agents or a society has over
agents or groups. Reputation is one mechanism to compute (individual) Trust.
• I will trust more an agent that has good reputation
15
• My reputation clearly affects the amount of trust that others have towards me.
• Reputation can have a sanctioning role in social groups: a bad reputation can be very
costly to one’s future transactions.
5. Most authors combine (individual) Trust with some form of (social) Reputation in their models
6. Recommender systems, Reputation mechanisms.
16
5. Group-derived:
• Models for groups can been extended to provide prior reputation estimates for agents in social
groups.
• Mapping between the initial individual reputation of a stranger and the group from which he or she
comes from.
• Problem: highly domain-dependent and model-dependent.
6. Propagated:
• Agent can attempt to estimate the stranger’s reputation based on information garnered from others
in the environment Also called word of mouth.
• Problem: The combination of the different reputation values tends to be an ad-hoc solution with no
social basis
TRUST AND REPUTATION MODELS
1. Not really for MAS, but can be applied to MAS
2. Idea: For serious life / business decisions, you want the
• opinion of a trusted expert
3. If an expert not personally known, then want to find a reference to one via a chain of friends and colleagues
4. Referral-chain provides:
• Way to judge quality of expert's advice
• Reason for the expert to respond in a trustworthy manner
• Finding good referral-chains is slow, time-consuming, but vital business gurus on “networking”
• Set of all possible referral-chains = a social network
5. Model integrates information from
• Official organizational charts (online)
• Personal web pages (+ crawling)
• External publication databases
• Internal technical document databases
6. Builds a social network based in referral chains
• Each node is a recommender agent
• Each node provides reputation values for specific areas
o E.g. Frieze is good in mathematics
• Searches in the referral network are made by areas
o E.g. browsing the network’s “mathematics” recommendation chains
7. Trust Model Overview
• 1-to-1 asymmetric trust relationships.
• Direct trust and recommender trust.
• Trust categories and trust values [-1,0,1,2,3,4].
8. Conditional transitivity.
Alice trusts Bob.&. Bob trusts Cathy
Alice trusts Cathy
Alice trusts. rec Bob.&. Bob says Bob trusts Cathy
Alice may trust Cathy
Alice trusts.rec Bob value X. &. Bob says Bob trusts Cathy value Y
Alice may trust Cathy value f(X, Y)
9. Recommendation protocol
1. Alice ->Bob: RRQ(Eric)
2. Bob ->Cathy: RRQ(Eric)
3. Cathy -> Bob: Rec(Eric,3)
4. Bob ->Alice: Rec(Eric,3)
17
Figure 3.11 Procedure
12. Direct Trust:
1. ReGreT assumes that there is no difference between direct interaction and direct.
2. observation in terms of reliability of the information. It talks about direct experiences.
3. The basic element to calculate a direct trust is the outcome.
4. An outcome of a dialog between two agents can be either:
• An initial contract to take a particular course of action and the actual result of the actions
taken, or
• An initial contract to x the terms and conditions of a transaction and the actual values
of the terms of the transaction.
13. Reputation Model: Witness reputation
a. First step to calculate a witness reputation is to identify the set of witnesses that will be considered
by the agent to perform the calculation.
b. The initial set of potential witnesses might be
i. the set of all agents that have interacted with the target agent in the past.
ii. This set, however, can be very big and the information provided by its members probably
suffer from the correlated evidence problem.
c. Next step is to aggregate these values to obtain a single value for the witness reputation.
18
The importance of each piece of information in the final reputation value will be proportional to the witness
credibility.
14. Reputation Model: Witness reputation
a. Two methods to evaluate witness credibility:
i. ReGreT uses fuzzy rules to calculate how the structure of social relations influences the
credibility on the information. The antecedent of each rule is the type and degree of a social
relation (the edges in a sociogram) and the consequent is the credibility of the witness from
the point of view of that social relation.
ii The second method used in the ReGreT system to calculate the credibility of a witness is to
evaluate the accuracy of previous pieces of information sent by that witness to the agent. The
agent is using the direct trust value to measure the truthfulness of the information received
from witnesses.
15. Reputation Model: Neighbourhood Reputation
a. Neighbourhood in a MAS is not related with the physical location of the agents but with the links
created through interaction.
b. The main idea is that the behaviour of these neighbours and the kind of relation they have with the
target agent can give some clues about the behaviour of the target agent.
c. To calculate a Neighbourhood Reputation the ReGreT system uses fuzzy rules.
i. The antecedents of these rules are one or several direct trusts associated to different
behavioural aspects and the relation between the target agent and the neighbour.
ii. The consequent is the value for a concrete reputation (that can be associated to the same
behavioural aspect of the trust values or not).
16. Reputation Model: System Reputation
a. To use the common knowledge about social groups and the role that the agent is playing in the
society as a mechanism to assign default reputations to the agents.
b. ReGreT assumes that the members of these groups have one or several observable features that
unambiguously identify their membership.
c. Each time an agent performs an action we consider that it is playing a single role.
i. E.g. an agent can play the role of buyer and seller but when it is selling a product only the role of
seller is relevant.
17. System reputations are calculated using a table for each social group where the rows are the roles the
agent can play for that group, and the columns the behavioural aspects.
18. Reputation Model: Default Reputation
a. To the previous reputation types, we must add a fourth one, the reputation assigned to a third-party
agent when there is no information at all: the default reputation.
b. Usually this will be a fixed value.
19. Reputation Model: Combining reputations
a. Each reputation type has different characteristics and there are a lot of heuristics that can be used to
aggregate the four reputation values to obtain a single and representative.
reputation value.
b. In ReGreT this heuristic is based on the default and calculated reliability assigned to each type.
c. Assuming we have enough information to calculate all the reputation types, we have the stance that
a. witness reputation is the first type that should be considered, followed by
b. the neighbourhood reputation,
c. system reputation
d. the default reputation.
20. Main criticism to Trust and Reputation research:
a. Proliferation of ad-hoc models weakly grounded in social theory.
b. No general, cross-domain model for reputation
c. Lack of integration between models
i. Comparison between models unfeasible
ii. Researchers are trying to solve this by, e.g. the ART competition
19
3.8 LANGUAGE MODELS
• Language can be defined as a set of strings;
• “print(2+2)” is a legal program in the language Python, where “2) + (2 print” is not. Since they are an
infinite number of legal programs, they cannot be enumerated; instead, they are specified by a set of
rules called a grammar. Formal languages also have rules that defined the meaning semantics of a
program.
• for example, the rules say that the “meaning” of “2 + 2” is 4, and the meaning of “1/0” is that an error
is signated.
1. Natural languages, such an English or Spanish, cannot be characterized as a definite set of sentences.
Example: Everyone agrees that “Not to be invited is sad” is a sentence of English, but people disagree on the
grammatically of “To be not invited is said”.
Therefore, it is more fruitful to define a natural language model as a probability distribution over sentences
rather than a definitive set. That is, rather than asking if a string of words is or is not a member of the set
defining the language, we instead ask for P(S = word) - what is the probability that a random sentence would
to words.
Natural languages are also ambiguous. “He saw her duck” can mean either that he saw a waterfowl belonging
to her, or that he saw her move to evade something. Thus, again, we cannot speak of a single meaning for a
sentence, but rather of a probability distribution over possible meaning.
2. Finally, natural language are difficult to deal with because they are very large, and constantly changing. Thus,
our language models are, at best, an approximation. We start with simplest possible approximation and move
up from there.
Language modelling (LM) is the use of various statistical and probabilistic techniques to determine the
probability of a given sequence of words occurring in a sentence. Language models analyze bodies of text data
to provide a basis for their word predictions. They are used in natural language processing (NLP) applications,
particularly ones that generate text as an output. Some of these applications include, machine translation and
question answering.
There are several different probabilistic approaches to modeling language, which vary depending on the
purpose of the language model. From a technical perspective, the various types differ by the amount of text
data they analyze and the math they use to analyze it. For example, a language model designed to generate
sentences for an automated Twitter bot may use different math and analyze text data in a different way than
a language model designed for determining the likelihood of a search query.
• N-gram. N-grams are a relatively simple approach to language models. They create a probability
distribution for a sequence of n The n can be any number, and defines the size of the "gram", or
sequence of words being assigned a probability. For example, if n = 5, a gram might look like this:
"can you please call me." The model then assigns probabilities using sequences of n size.
Basically, n can be thought of as the amount of context the model is told to consider. Some types
of n-grams are unigrams, bigrams, trigrams and so on.
N-gram Language Model:
An N-gram language model predicts the probability of a given N-gram within any sequence of words in the
language. A good N-gram model can predict the next word in the sentence i.e the value of p(w|h)
20
Example of N-gram such as unigram (“This”, “article”, “is”, “on”, “NLP”) or bi-gram (‘This article’, ‘article is’,
‘is on’,’on NLP’).
Now, we will establish a relation on how to find the next word in the sentence using
We need to calculate p(w|h), where is the candidate for the next word. For example in the above example,
lets’ consider, we want to calculate what is the probability of the last word being “NLP” given the previous
words:
But how do we calculate it? The answer lies in the chain rule of probability:
• For unigram:
• For Bigram:
21
information retrieval to examine a pool of documents and match the most relevant one to a specific
query.
• Bidirectional. Unlike n-gram models, which analyze text in one direction (backwards), bidirectional
models analyze text in both directions, backwards and forwards. These models can predict any
word in a sentence or body of text by using every other word in the text. Examining text
bidirectionally increases result accuracy. This type is often utilized in machine learning and speech
generation applications. For example, Google uses a bidirectional model to process search queries.
• Exponential. Also known as maximum entropy models, this type is more complex than n-grams.
Simply put, the model evaluates text using an equation that combines feature functions and n-
grams. Basically, this type specifies features and parameters of the desired results, and unlike n-
grams, leaves analysis parameters more ambiguous -- it doesn't specify individual gram sizes, for
example. The model is based on the principle of entropy, which states that the probability
distribution with the most entropy is the best choice. In other words, the model with the most
chaos, and least room for assumptions, is the most accurate. Exponential models are designed
maximize cross entropy, which minimizes the amount statistical assumptions that can be made.
This enables users to better trust the results they get from these models.
• Continuous space. This type of model represents words as a non-linear combination of weights in
a neural network. The process of assigning a weight to a word is also known as word embedding.
This type becomes especially useful as data sets get increasingly large, because larger datasets
often include more unique words. The presence of a lot of unique or rarely used words can cause
problems for linear model like an n-gram. This is because the amount of possible word sequences
increases, and the patterns that inform results become weaker. By weighting words in a non-linear,
distributed way, this model can "learn" to approximate words and therefore not be misled by any
unknown values. Its "understanding" of a given word is not as tightly tethered to the immediate
surrounding words as it is in n-gram models.
The models listed above are more general statistical approaches from which more specific variant language
models are derived. For example, as mentioned in the n-gram description, the query likelihood model is a more
specific or specialized model that uses the n-gram approach. Model types may be used in conjunction with one
another.
The models listed also vary significantly in complexity. Broadly speaking, more complex language models are
better at NLP tasks, because language itself is extremely complex and always evolving. Therefore, an
exponential model or continuous space model might be better than an n-gram for NLP tasks, because they are
designed to account for ambiguity and variation in language.
A good language model should also be able to process long-term dependencies, handling words that may derive
their meaning from other words that occur in far-away, disparate parts of the text. An LM should be able to
understand when a word is referencing another word from a long distance, as opposed to always relying on
proximal words within a certain fixed history. This requires a more complex model.
It is used directly in a variety of industries including tech, finance, healthcare, transportation, legal, military and
government. Additionally, it's likely most people reading this have interacted with a language model in some
way at some point in the day, whether it be through Google search, an autocomplete text function or engaging
with a voice assistant.
22
The roots of language modeling as it exists today can be traced back to 1948. That year, Claude Shannon
published a paper titled "A Mathematical Theory of Communication." In it, he detailed the use of a stochastic
model called the Markov chain to create a statistical model for the sequences of letters in English text. This
paper had a large impact on the telecommunications industry, laid the groundwork for information theory and
language modeling. The Markov model is still used today, and n-grams specifically are tied very closely to the
concept.
• Speech recognition-- involves a machine being able to process speech audio. This is commonly used
by voice assistants like Siri and Alexa.
• Machine translation -- involves the translation of one language to another by a machine. Google
Translate and Microsoft Translator are two programs that do this. SDL Government is another,
which is used to translate foreign social media feeds in real time for the U.S. government.
• Parts-of-speech tagging -- involves the markup and categorization of words by certain grammatical
characteristics. This is utilized in the study of linguistics, first and perhaps most famously in the
study of the Brown Corpus, a body of composed of random English prose that was designed to be
studied by computers. This corpus has been used to train several important language models,
including one used by Google to improve search quality.
• Parsing -- involves analysis of any string of data or sentence that conforms to formal grammar and
syntax rules. In language modeling, this may take the form of sentence diagrams that depict each
word's relationship to the others. Spell checking applications use language modeling and parsing.
• Sentiment analysis -- involves determining the sentiment behind a given phrase. Specifically, it can
be used to understand opinions and attitudes expressed in a text. Businesses can use this to analyze
product reviews or general posts about their product, as well as analyze internal data like employee
surveys and customer support chats. Some services that provide sentiment analysis tools are
Repustate and Hubspot's ServiceHub. Google's NLP tool -- called Bidirectional Encoder
Representations from Transformers (BERT) -- is also used for sentiment analysis.
• Optical character recognition -- involves the use of a machine to convert images of text into
machine encoded text. The image may be a scanned document or document photo, or a photo
with text somewhere in it -- on a sign, for example. It is often used in data entry when processing
old paper records that need to be digitized. In can also be used to analyze and identify handwriting
samples.
• Information retrieval -- involves searching in a document for information, searching for documents
in general, and searching for metadata that corresponds to a document. Web browsers are the
most common information retrieval applications.
3.9 INFORMATION RETRIEVAL
Information Retrieval (IR) can be defined as a software program that deals with the organization, storage,
retrieval, and evaluation of information from document repositories, particularly textual information.
Information Retrieval is the activity of obtaining material that can usually be documented on an unstructured
nature i.e. usually text which satisfies an information need from within large collections which is stored on
computers. For example, Information Retrieval can be when a user enters a query into the system.
Not only librarians, professional searchers, etc engage themselves in the activity of information retrieval but
nowadays hundreds of millions of people engage in IR every day when they use web search engines.
Information Retrieval is believed to be the dominant form of Information access. The IR system assists the
users in finding the information they require but it does not explicitly return the answers to the question. It
notifies regarding the existence and location of documents that might consist of the required information.
Information retrieval also extends support to users in browsing or filtering document collection or processing
23
a set of retrieved documents. The system searches over billions of documents stored on millions of
computers. A spam filter, manual or automatic means are provided by Email program for classifying the mails
so that it can be placed directly into particular folders.
An IR system can represent, store, organize, and access information items. A set of keywords are required to
search. Keywords are what people are searching for in search engines. These keywords summarize the
description of the information.
What is an IR Model?
An Information Retrieval (IR) model selects and ranks the document that is required by the user or the user
has asked for in the form of a query. The documents and the queries are represented in a similar manner, so
that document selection and ranking can be formalized by a matching function that returns a retrieval status
value (RSV) for each document in the collection. Many of the Information Retrieval systems represent
document contents by a set of descriptors, called terms, belonging to a vocabulary V. An IR model determines
the query-document matching function according to four main approaches:
The estimation of the probability of user’s relevance rel for each document d and query q with respect to a
set R q of training documents: Prob (rel|d, q, Rq)
Types of IR Models
24
Components of Information Retrieval/ IR Model
• Acquisition: In this step, the selection of documents and other objects from various web
resources that consist of text-based documents takes place. The required data is collected by web
crawlers and stored in the database.
• Representation: It consists of indexing that contains free-text terms, controlled vocabulary,
manual & automatic techniques as well. example: Abstracting contains summarizing and
Bibliographic description that contains author, title, sources, data, and metadata.
• File Organization: There are two types of file organization methods. i.e. Sequential: It contains
documents by document data. Inverted: It contains term by term, list of records under each
term. Combination of both.
• Query: An IR process starts when a user enters a query into the system. Queries are formal
statements of information needs, for example, search strings in web search engines. In
information retrieval, a query does not uniquely identify a single object in the collection. Instead,
several objects may match the query, perhaps with different degrees of relevancy.
25
User Interaction With Information Retrieval System
The User Task: The information first is supposed to be translated into a query by the user. In the information
retrieval system, there is a set of words that convey the semantics of the information that is required
whereas, in a data retrieval system, a query expression is used to convey the constraints which are satisfied
by the objects. Example: A user wants to search for something but ends up searching with another thing.
This means that the user is browsing and not searching. The above figure shows the interaction of the user
through different tasks.
• Logical View of the Documents: A long time ago, documents were represented through a set of
index terms or keywords. Nowadays, modern computers represent documents by a full set of
words which reduces the set of representative keywords. This can be done by eliminating
stopwords i.e. articles and connectives. These operations are text operations. These text
operations reduce the complexity of the document representation from full text to set of index
terms.
1. Early Developments: As there was an increase in the need for a lot of information, it became necessary to
build data structures to get faster access. The index is the data structure for faster retrieval of information.
Over centuries manual categorization of hierarchies was done for indexes.
2. Information Retrieval In Libraries: Libraries were the first to adopt IR systems for information retrieval. In
first-generation, it consisted, automation of previous technologies, and the search was based on author
name and title. In the second generation, it included searching by subject heading, keywords, etc. In the third
generation, it consisted of graphical interfaces, electronic forms, hypertext features, etc.
3. The Web and Digital Libraries: It is cheaper than various sources of information, it provides greater access
to networks due to digital communication and it gives free access to publish on a larger medium.
26
3. A typical relational-based extraction system is FASTUS, which handles news stories about corporate mergers
and acquisitions.
4. A relational extraction system can be built as a series of cascaded finite-state transducers.
5. That is, the system consists of a series of small, efficient finite-state automata (FSAs), where each automaton
receives text as input, transduces the text into a different format, and passes it along to the next automaton.
FASTUS consists of five stages:
1. Tokenization
2. Complex-word handling
3. Basic-group handling
4. Complex-phrase handling
5. Structure merging
FASTUS’s first stage is tokenization, which segments the stream of characters into tokens (words, numbers, and
punctuation). For English, tokenization can be simple; just separating characters at white space or punctuation
does a fairly good job.
Some tokenizers also deal with markup languages such as HTML, SGML, and XML.
7. The second stage handles complex words, including collocations such as “set up” and “joint venture,” as well
as proper names such as “Bridgestone Sports Co.” These are recognized by a combination of lexical entries and
finite- state grammar rules.
8. The third stage handles basic groups, meaning noun groups and verb groups. The idea is to chunk these into
units that will be managed by the later stages.
9. The fourth stage combines the basic groups into complex phrases. Again, the aim is to have rules that are
finite- state and thus can be processed quickly, and that result in unambiguous (or nearly unambiguous) output
phrases. One type of combination rule deals with domain-specific events.
10. The final stage merges structures that were built up in the previous step. If the next sentence says “The
joint venture will start production in January,” then this step will notice that there are two references to a joint
venture, and that they should be merged into one. This is an instance of the identity uncertainty problem.
PROBABILISTIC MODELS FOR INFORMATION EXTRACTION
1. The simplest probabilistic model for sequences with hidden state is the hidden Markov
model, or HMM.
2. HMMs have two big advantages over FSAs for extraction.
• First, HMMs are probabilistic, and thus tolerant to noise.
• In a regular expression, if a single expected character is missing, the regex fails to match; with HMMs
there is graceful degradation with missing characters/words, and we get a probability indicating the
degree of match, not just a Boolean match/fail.
• Second, HMMs can be trained from data; they don’t require laborious engineering of templates, and
thus they can more easily be kept up to date as text changes over time.
Figure. Hidden Markov model for the speaker of a talk announcement. The two square states are the target
(note the second target state has a self-loop, so the target can match a strain of any length), the four circles to
the left are prefix, and the one of the rights is the postfix. For each state, only a few of the high-probability
words are shown. From Freitag and McCallum (2000)
3. Once the HMMs have been learned, we can apply them to a text, using the Viterbi algorithm to find the
most likely path through the HMM states. One approach is to apply each attribute HMM separately; in this case
you would expect most of the HMMs to spend most of their time in background states. This is appropriate
27
when the extraction is sparse - when the number of extracted words is small compared to the length of the
text.
4. The other approach is to combine all the individual attributes into one big HMM, which
would then find a path that wanders through different target attributes, first finding a speaker target, then a
date target, etc. Separate HMMs are better when we expect just one of each attribute in a text and one big
HMM is better when the texts are more free-form and dense with attributes.
5. HMMs have the advantage of supplying probability numbers that can help make the choice. If some targets
are missing, we need to decide if this is an instance of the desired relation at all, or if the targets found are false
positives. A machine learning algorithm can be trained to make this choice.
ONTOLOGY EXTRACTION FROM LARGE CORPORA
1. A different application of extraction technology is building a large knowledge base or ontology of facts from
a corpus. This is different in three ways:
• First it is open-ended—we want to acquire facts about all types of domains, not just one specific domain.
• Second, with a large corpus, this task is dominated by precision, not recall—just as with question
answering on the Web.
• Third, the results can be statistical aggregates gathered from multiple sources, rather than being
extracted from one specific text.
2. Here is one of the most productive templates: NP such as NP (, NP) * (,)? ((and | or) NP)?
3. Here the bold words and commas must appear literally in the text, but the parentheses are for grouping,
the asterisk means repetition of zero or more, and the question mark means optional.
4. NP is a variable standing for a noun phrase
5. This template matches the texts “diseases such as rabies affect your dog” and “supports network protocols
such as DNS,” concluding that rabies is a disease and DNS is a network protocol.
6. Similar templates can be constructed with the key words “including,” “especially,” and “or other.” Of course
these templates will fail to match many relevant passages, like “Rabies is a disease.” That is intentional.
7. The “NP is a NP” template does indeed sometimes denote a subcategory relation, but it often means
something else, as in “There is a God” or “She is a little tired.” With a large corpus we can afford to be picky; to
use only the high-precision templates.
8. We’ll miss many statements of a subcategory relationship, but most likely we’ll find a paraphrase of the
statement somewhere else in the corpus in a form we can use.
28
3.11 NATURAL LANGUAGE PROCESSING
Natural Language Processing (NLP) refers to AI method of communicating with an intelligent systems using a
natural language such as English.
Processing of Natural Language is required when you want an intelligent system like robot to perform as per
your instructions, when you want to hear decision from a dialogue based clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural languages’ humans use.
The input and output of an NLP system can be −
• Speech
• Written Text
Components of NLP
Difficulties in NLU
NLP Terminology
29
• Semantics − It is concerned with the meaning of words and how to combine words into
meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations and how
the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the interpretation
of the next sentence.
• World Knowledge − It includes the general knowledge about the world.
Steps in NLP
• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The
text is checked for meaningfulness. It is done by mapping syntactic structures and objects in
the task domain. The semantic analyzer disregards sentence such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of the
sentence just before it. In addition, it also brings about the meaning of immediately succeeding
sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.
There are several algorithms researchers have developed for syntactic analysis, but we consider only the
following simple methods −
• Context-Free Grammar
• Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists of rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence −
30
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand and
process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which describe
what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols. According
to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then the string combined
by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −
Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such as
"The bird peck the grains" can be wrongly permitted. i. e. the subject-verb agreement error is approved as
correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
• They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.
31
•To bring out high precision, multiple sets of grammar need to be prepared. It may require a
completely different sets of rules for parsing singular and plural variations, passive sentences,
etc., which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that
matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again
with a different set of rules. This is repeated until a specific rule is found which describes the structure of the
sentence.
Merit − It is simple to implement.
Demerits −
• It is inefficient, as the search process has to be repeated if an error occurs.
• Slow speed of working.
32
3.13 SPEECH RECOGNITION
Definition: Speech recognition is the task of identifying a sequence of SPEECH words uttered by a speaker, given
the acoustic signal. It has become one of the mainstream applications of AI.
1. Example: The phrase “recognize speech” sounds almost the same as “wreak a nice beach” when spoken
quickly. Even this short example shows several of the issues that make speech problematic.
2. First segmentation: written words in English have spaces between them, but in fast speech there are no
pauses in “wreck a nice” that would distinguish it as a multiword phrase as opposed to the single word
“recognize”.
3. Second, coarticulation: when speaking quickly the “s” sound at the end of “nice” merges with the “b” sound
at the beginning of “beach” yielding something that is close to a “sp”. Another problem that does not show up
in this example is homophones – words like “to”, “too” and “two” that sound the same but different in meaning
4. Once we define the acoustic and language models, we can solve for the most likely sequence of words using
the Viterbi algorithm.
Acoustic Model
1. An analog-to-digital converter measures the size of the current – which approximates the amplitude of the
sound wave at discrete intervals called as sampling rate.
2. The precision of each measurement is determined by the quantization factor; speech recognizers typically
keep 8 to 12 bits. That means that a low-end system, sampling at 8 kHz with 8-bit quantization, would require
nearly half a megabyte per minute of speech.
3. A phoneme is the smallest unit of sound that has a distinct meaning to speakers of a particular language.
For example, the “t” in “stick” sounds similar enough to the “t” in “tick” that speakers of English consider them
the same phoneme.
Each frame is summarized by a vector of features. Below picture represents phone model.
Figure. Translating the acoustic signal into a sequence of frames. In this diagram each frame is described by the
discretized values of three acoustic features; a real system would have dozens of features.
33
3.14 ROBOT
1. Robots are physical agents that perform tasks by manipulating the physical world.
2. To do so, they are equipped with effectors such as legs, wheels, joints, and grippers.
3. Effectors have a single purpose: to assert physical forces on the environment.
4. Robots are also equipped with sensors, which allow them to perceive their environment.
5. Present day robotics employs a diverse set of sensors, including cameras and lasers to measure the
environment, and gyroscopes and accelerometers to measure the robot’s own motion.
6. Most of today’s robots fall into one of three primary categories. Manipulators, or robot arms are physically
anchored to their workplace, for example in a factory assembly line or on the International Space Station.
Robot Hardware
1. Sensors are the perceptual interface between robot and environment.
2. Passive sensors, such as cameras, are true observers of the environment: they capture signals that are
generated by other sources in the environment.
3. Active sensors, such as sonar, send energy into the environment. They rely on the fact that this energy is
reflected to the sensor. Active sensors tend to provide more information than passive sensors, but at the
expense of increased power consumption and with a danger of interference when multiple active sensors are
used at the same time. Whether active or passive, sensors can be divided into three types, depending on
whether they sense the environment, the robot’s location, or the robot’s internal configuration.
4. Range finders are sensors that measure the distance to nearby objects. In the early days of robotics, robots
were commonly equipped with sonar sensors. Sonar sensors emit directional sound waves, which are reflected
by objects, with some of the sound making it.
5. Stereo vision relies on multiple cameras to image the environment from slightly different viewpoints,
analyzing the resulting parallax in these images to compute the range of surrounding objects. For mobile ground
robots, sonar and stereo vision are now rarely used, because they are not reliably accurate.
6. Other range sensors use laser beams and special 1-pixel cameras that can be directed using complex
arrangements of mirrors or rotating elements. These sensors are called scanning lidars (short for light detection
and ranging).
7. Other common range sensors include radar, which is often the sensor of choice for UAVs. Radar sensors can
measure distances of multiple kilometers. On the other extreme end of range sensing are tactile sensors such
as whiskers, bump panels, and touch-sensitive skin. These sensors measure range based on physical contact
and can be deployed only for sensing objects very close to the robot.
8. A second important class of sensors is location sensors. Most location sensors use range sensing as a primary
component to determine location. Outdoors, the Global Positioning System (GPS) is the most common solution
to the localization problem.
9. The third important class is proprioceptive sensors, which inform the robot of its own motion. To measure
the exact configuration of a robotic joint, motors are often equipped with shaft decoders that count the
revolution of motors in small increments.
10. Inertial sensors, such as gyroscopes, rely on the resistance of mass to the change of velocity. They can help
reduce uncertainty.
11. Other important aspects of robot state are measured by force sensors and torque sensors. These are
indispensable when robots handle fragile objects or objects whose exact shape and location is unknown.
Robotic Perception
1. Perception is the process by which robots map sensor measurements into internal representations of the
environment. Perception is difficult because sensors are noisy, and the environment is partially observable,
unpredictable, and often dynamic. In other words, robots have all the problems of state estimation (or filtering)
2. As a rule of thumb, good internal representations for robots have three properties: they contain enough
information for the robot to make good decisions, they are structured so that they can be updated efficiently,
and they are natural in the sense that internal variables correspond to natural state variables in the physical
world.
Figure. Robot perception can be viewed as temporal inference from sequence of actions and measurements,
as illustrated by this dynamic Bayers network.
34
2. Another machine learning technique enables robots to continuously adapt to broad changes in sensor
measurements.
3. Adaptive perception techniques enable robots to adjust to such changes. Methods that make robots collect
their own training data (with labels!) are called self-supervised. In this instance, the robot machine learning to
leverage a short-range sensor that works well for terrain classification into a sensor that can see much farther.
PLANNING TO MOVE
1. All of a robot’s deliberations ultimately come down to deciding how to move effectors.
2. The point-to-point motion problem is to deliver the robot or its end effector to a designated target location.
3. A greater challenge is the compliant motion problem, in which a robot moves while being in physical contact
with an obstacle.
4. An example of compliant motion is a robot manipulator that screws in a light bulb, or a robot that pushes a
box across a tabletop. We begin by finding a suitable representation in which motion-planning problems can
be described and solved. It turns out that the configuration space—the space of robot states defined by
location, orientation, and joint angles—is a better place to work than the original 3D space.
5. The path planning problem is to find a path from one configuration to another in configuration space.
6. Here are two main approaches: cell decomposition and skeletonization. Each reduces the continuous path-
planning problem to a discrete graph-search problem. In this section, we assume that motion is deterministic,
and that localization of the robot is exact. Subsequent sections will relax these assumptions.
7. The second major family of path-planning algorithms is based on the idea of skeletonization. These
algorithms reduce the robot’s free space to a one-dimensional representation, for which the planning problem
is easier. This lower-dimensional representation is called a skeleton of the configuration space.
Configuration Space
1. It has two joints that move independently. Moving the joints alters the (x, y) coordinates of the elbow and
the gripper. (The arm cannot move in the z direction.) This suggests that the robot’s configuration can be
described by a four-dimensional coordinate: (xe, ye) for the location of the elbow relative to the environment
and (xg, yg) for the location of the gripper. Clearly, these four coordinates characterize the full state of the
robot.
They constitute what is known as workspace representation.
2. Configuration spaces have their own problems. The task of a robot is usually expressed in workspace
coordinates, not in configuration space coordinates. This raises the question of how to map between workspace
coordinates and configuration space.
3. These transformations are linear for prismatic joints and trigonometric for revolute joints. This chain of
coordinate transformation is known as kinematics.
4. The inverse problem of calculating the configuration of a robot whose effector location is specified in
workspace coordinates is known as inverse kinematics.
The configuration space can be decomposed into two subspaces: the space of all configurations that a robot
may attain, commonly called free space, and the space of unattainable configurations, called occupied space.
Figure (a) A repelling potential field pushes the robot away from obstacles, (b) Path found by simultaneously
minimizing path length and the potential.
Figure (a) The Voronoi graph is the set of points equidistance to two or more obstacles in configuration space
(b) A probabilistic moodmap, composed of 100 randomly chosen points in free space.
Robust methods
1. A robust method is one that assumes a bounded amount of uncertainty in each aspect of a problem but does
not assign probabilities to values within the allowed interval.
2. A robust solution is one that works no matter what actual values occur, provided they are within the assumed
intervals.
3. An extreme form of robust method is the conformant planning approach.
Figure. A two-dimensional environment, velocity uncertainty cone, and envelope of possible robot motions.
The intended velocity is v, but with uncertainty the actual velocity could be anywhere in Cv, resulting in a final
configuration somewhere in the motion envelope, which means we wouldn’t know if we hit the hole or not.
36
Figure. The first motion command and the resulting envelope of possible robot motions. No matter what the
error, we know the final configuration will be to the left of the hole.
37