0% found this document useful (0 votes)
33 views37 pages

Ai Unit Iii

Uploaded by

akhilmvs19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
33 views37 pages

Ai Unit Iii

Uploaded by

akhilmvs19
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 37

18AIC201J

FOUNDATION OF
ARTIFICIAL INTELLIGENCE

NOTES
UNIT III

SYLLABUS

Architecture for intelligent agents, Agent communication – Negotiation-Bargaining, Argumentation-Agents-


Trust, Reputation-Multi agent systems-AI applications-"Language Models, Information Retrieval" - Information
extraction, Natural language processing-Machine translation, Speech recognition - Robot Hardware, Perception

1
3.1 DEFINITION
Agent architectures, like software architectures, are formally a description of the elements from which a system
is built and the way they communicate. Further, these elements can be defined from patterns with specific
constraints. [Shaw/Garlin 1996]
1. Several common architectures exist that go by the names pipe-and filter or layered architecture.
2. these define the interconnections between components.
3. Pipe-and-Filter defines a model where data is moved through a set of one or more objects that perform a
transformation.
4. Layered simply means that the system is comprised of a set of layers that provide a specific set of logical
functionalities, and that connectivity is commonly restricted to the layers contiguous to one another.

3.2 Architecture for intelligent agents


Based on the goals of the agent application, a variety of agent architectures exist to help. This section will
introduce some of the major architecture types and applications for which they can be used.
1. Reactive architectures
2. Deliberative architectures
3. Blackboard architectures
4. Belief-desire-intention (BDI) architecture
5. Hybrid architectures
6. Mobile architectures
1. REACTIVE ARCHITECTURES
1. A reactive architecture is the simplest architecture for agents.
2. In this architecture, agent behaviours are simply a mapping between stimulus and response.
3. The agent has no decision-making skills, only reactions to the environment in which it exists.
4. The agent simply reads the environment and then maps the state of the environment to one or more actions.
Given the environment, more than one action may be appropriate, and therefore the agent must choose.
5. The advantage of reactive architectures is that they are extremely fast.
6. This kind of architecture can be implemented easily in hardware, or fast in software lookup.
7. The disadvantage of reactive architectures is that they apply only to simple environments.
8. Sequences of actions require the presence of state, which is not encoded into the mapping function.
2. DELIBERATIVE ARCHITECTURES
1. A deliberative architecture, as the name implies, is one that includes some deliberation over the action to
perform given the current set of inputs.
2. Instead of mapping the sensors directly to the actuators, the deliberative architecture considers the sensors,
state, prior results of given actions, and other information to select the best action to perform.
3. The mechanism for action selection as is undefined. This is because it could be a variety of mechanisms
including a production system, neural network, or any other intelligent algorithm.
4. The advantage of the deliberative architecture is that it can be used to solve much more complex problems
than the reactive architecture.
5. It can perform planning and perform sequences of actions to achieve a goal.
6. The disadvantage is that it is slower than the reactive architecture due to the deliberation for the action to
select.

Figure 3.1 Reactive architecture defines a simple agent.


2
Figure 3.2 A deliberative agent architecture considers its actions.
3. BLACKBOARD ARCHITECTURES
1. The blackboard architecture is a very common architecture that is also very interesting.
2. The first blackboard architecture was HEARSAY-II, which was a speech understanding system. This
architecture operates around a global work area call the blackboard.
3. The blackboard is a common work area for several agents that work cooperatively to solve a given problem.
4. The blackboard therefore contains information about the environment, but also intermediate work results
by the cooperative agents.
5. In this example, two separate agents are used to sample the environment through the available sensors (the
sensor agent) and through the available actuators (action agent).
6. The blackboard contains the current state of the environment that is constantly updated by the sensor agent,
and when an action can be performed (as specified in the blackboard), the action agent translates this action
into control of the actuators.
7. The control of the agent system is provided by one or more reasoning agents.
8. These agents work together to achieve the goals, which would also be contained in the blackboard.
9. In this example, the first reasoning agent could implement the goal definition behaviours, where the second
reasoning agent could implement the planning portion (to translate goals into sequences of actions).
10. Since the blackboard is a common work area, coordination must be provided such that agents don’t step
over one another.
11. For this reason, agents are scheduled based on their need. For example, agents can monitor the blackboard,
and as information is added, they can request the ability to operate.
12. The scheduler can then identify which agents desire to operate on the blackboard, and then invoke them
accordingly.
13. The blackboard architecture, with its globally available work area, is easily implemented with a multi-
threading system.
14. Each agent becomes one or more system threads. From this perspective, the blackboard architecture is
very common for agent and non-agent systems.

3
Figure 3.3 The blackboard architecture supports multi-agent problem solving.

4. BELIEF-DESIRE-INTENTION (BDI) ARCHITECTURE


1. BDI, which stands for Belief-Desire-Intention, is an architecture that follows the theory of human reasoning
as defined by Michael Bratman.
2. Belief represents the view of the world by the agent (what it believes to be the state of the environment in
which it exists). Desires are the goals that define the motivation of the agent (what it wants to achieve).
3. The agent may have numerous desires, which must be consistent. Finally, Intentions specify that the agent
uses the Beliefs and Desires to choose one or more actions to meet the desires.
4. As we described above, the BDI architecture defines the basic architecture of any deliberative agent. It stores
a representation of the state of the environment (beliefs), maintains a set of goals (desires), and finally, an
intentional element that maps desires to beliefs (to provide one or more actions that modify the state of the
environment based on the agent’s needs).

Figure 3.4 The BDI architecture desires to model mental attributes.


5. HYBRID ARCHITECTURES
1. As is the case in traditional software architecture, most architectures are hybrids.
2. For example, the architecture of a network stack is made up of a pipe-and-filter architecture and a layered
architecture.
3. This same stack also shares some elements of a blackboard architecture, as there are global elements that
are visible and used by each component of the architecture.
4. The same is true for agent architectures. Based on the needs of the agent system, different architectural
elements can be chosen to meet those needs.
6. MOBILE ARCHITECTURES
1. The final architectural pattern that we’ll discuss is the mobile agent architecture.
2. This architectural pattern introduces the ability for agents to migrate themselves between hosts. The agent
architecture includes the mobility element, which allows an agent to migrate from one host to another.
3. An agent can migrate to any host that implements the mobile framework.

4
4. The mobile agent framework provides a protocol that permits communication between hosts for agent
migration.
5. This framework also requires some kind of authentication and security, to avoid a mobile agent framework
from becoming a conduit for viruses. Also implicit in the mobile agent framework is a means for discovery.
6. For example, which hosts are available for migration, and what services do they provide? Communication is
also implicit, as agents can communicate with one another on a host, or across hosts in preparation for
migration.
7. The mobile agent architecture is advantageous as it supports the development of intelligent distributed
systems. But a distributed system that is dynamic, and whose configuration and loading is defined by the agents
themselves.

Figure 3.5 The mobile agent framework supports agent mobility.


7. ARCHITECTURE DESCRIPTIONS
1. Subsumption Architecture (Reactive Architecture)
2. Behaviour Networks (Reactive Architecture)
3. ATLANTIS (Deliberative Architecture)
4. Homer (Deliberative Arch)
5. BB1 (Blackboard)
6. Open Agent Architecture (Blackboard)
7. Procedural Reasoning System (BDI)
8. Aglets (Mobile)
9. Messengers (Mobile)
10. Soar (Hybrid)
8. SUBSUMPTION ARCHITECTURE (REACTIVE ARCHITECTURE)
1. The Subsumption architecture, originated by Rodney Brooks in the late 1980s, was created out of research
in behaviour-based robotics.
2. The fundamental idea behind subsumption is that intelligent behaviour can be created through a collection
of simple behaviour modules.
3. These behaviour modules are collected into layers. At the bottom are behaviours that are reflexive in nature,
and at the top, behaviours that are more complex. Consider the abstract model shown in Figure.
4. At the bottom (level 0) exist the reflexive behaviours (such as obstacle avoidance). If these behaviours are
required, then level 0 consumes the inputs and provides an action at the output. But no obstacles exist, so the
next layer up is permitted to subsume control.
5. At each level, a set of behaviours with different goals compete for control based on the state of the
environment.
6. To support this capability levels can be inhibited (in other words, their outputs are disabled). Levels can also
be suppressed such that sensor inputs are routed to higher layers. As shown in Figure.
7. Subsumption is a parallel and distributed architecture for managing sensors and actuators. The basic premise
is that we begin with a simple set of behaviours, and once we’ve succeeded there, we extend with additional
levels and higher- level behaviours.
8. For example, we begin with obstacle avoidance and then extend for object seeking. From this perspective,
the architecture takes a more evolutionary design approach.
9. Subsumption does have its problems. It is simple, but it turns out not to be extremely extensible. As new
layers are added, the layers tend to interfere with one another, and then the problem becomes how to layer
the behaviours such that each could control when the time is right.

5
10. Subsumption is also reactive in nature, meaning that in the end, the architecture still simply maps inputs
to behaviours (no planning occurs, for example). What subsumption does provide is a means to choose which
behaviour for a given environment.

Figure 3.6 Architectural view of the subsumption architecture

9. BEHAVIOUR NETWORKS (REACTIVE ARCHITECTURE)


1. Behaviour networks, created by Pattie Maes in the late 1980s, is another reactive architecture that is
distributed in nature. Behaviour networks attempt to answer the question, which action is best suited for a
given situation.
2. As the name implies, behaviour networks are networks of behaviours that include activation links and
inhibition links.
3. An example behaviour network for a game agent is shown in Figure. As shown in the legend, behaviours are
rectangles and define the actions that the agent may take (attack, explore, reload, etc.).
4. The ovals specify the preconditions for actions to be selected, which are inputs from the environment.
5. Preconditions connect to behaviours through activation links (they promote the behaviour to be performed)
or inhibition links (that inhibit the behaviour from being performed).
6. The environment is sampled, and then the behaviour for the agent is selected based on the current state of
the environment. The first thing to note is the activation and inhibition links. For example, when the agent’s
health is low, attack and exploration are inhibited, leaving the agent to find the nearest shelter. Also, while
exploring, the agent may come across medkits or ammunition.
7. If a medkit or ammunition is found, it’s used. Maes’ algorithm referred to competence modules, which
included preconditions (that must be fulfilled before the module can activate), actions to be performed, as well
as a level of activation.
8. The activation level is a threshold that is used to determine when a competence module may activate.
9. The algorithm also includes decay, such that activiations dissipate over time. Like the subsumption
architecture, behaviour networks are instances of Behaviour-Based Systems (BBS). The primitive actions
produced by these systems are all behaviours, based on the state of the environment.
10. Behaviour networks are not without problems. Being reactive, the architecture does not support planning
or higher- level behaviours. The architecture can also suffer when behaviours are highly inter-dependent. With
many competing goals, the behaviour modules can grow dramatically in order to realize the intended
behaviours. But for simpler architecture, such as the FPS game agent in Figure 3.7, this algorithm is ideal.

Figure 3.7 Behaviour network for a simple game agent

6
10. ATLANTIS (Deliberative Architecture)
1. The goal of ATLANTIS (A Three-Layer Architecture for Navigating Through Intricate Situations), was to create
a robot that could navigate through dynamic and imperfect environments in pursuit of explicitly stated high-
level goals.
2. ATLANTIS was to prove that a goal-oriented robot could be built from a hybrid architecture of lower-level
reactive behaviours and higher- level deliberative behaviours.

Figure 3.8 ATLANTIS Architecture


3. Where the subsumption architecture allows layers to subsume control, ATLANTIS operates on the
assumption that these behaviours are not exclusive of one another. The lowest layer can operate in a reactive
fashion to the immediate needs of the environment, while the uppermost layer can support planning and more
goal-oriented behaviours.
4. In ATLANTIS, control is performed from the bottom- up. At the lowest level (the control layer) are the reactive
behaviours.
5. These primitive level actions are capable of being executed first, based on the state of the environment. At
the next layer is the sequencing layer. This layer is responsible for executing plans created by the deliberative
layer.
6. The deliberative layer maintains an internal model of the environment and creates plans to satisfy goals.
7. The sequencing layer may or may not complete the plan, based on the state of the environment. This leaves
the deliberation layer to perform the computationally expensive tasks. This is another place that the
architecture is a hybrid.
8. The lower- level behaviour-based methods (in the controller layer) are integrated with higher- level classical
AI mechanisms (in the deliberative layer). Interestingly, the deliberative layer does not control the sequencing
layer, but instead simply advises on sequences of actions that it can perform.
9. The advantage of this architecture is that the low- level reactive layer and higher- level intentional layers are
asynchronous. This means that while deliberative plans are under construction, the agent is not susceptible to
the dynamic environment. This is because even though planning can take time at the deliberative layer, the
controller can deal with random events in the environment.
11. HOMER (DELIBERATIVE ARCH)
1. Homer is another interesting deliberative architecture that is both modular and integrated. Homer was
created by Vere and Bickmore in 1990 as a deliberative architecture with some very distinct differences to other
architectures.
2. At the core of the Homer architecture is a memory that is divided into two parts. The first part contains
general knowledge (such as knowledge about the environment). The second part is called episodic knowledge,
which is used to record experiences in the environment (perceptions and actions taken).
3. The natural language processor accepts human input via a keyboard, and parses and responds using a
sentence generator. The temporal planner creates dynamic plans to satisfy predefined goals and is capable of
replanning if the environment requires.

7
4. The architecture also includes a plan executor (or interpreter), which is used to execute the plan at the
actuators. The architecture also included a variety of monitor processes. The basic idea behind Homer was an
architecture for general intelligence.
5. The keyboard would allow regular English language input, and a terminal would display generated English
language sentences. The user could therefore communicate with Homer to specify goals and receive feedback
via the terminal.
6. Homer could log perceptions of the world, with timestamps, to allow dialogue with the user and rational
answers to questions. Reflective (monitor) processes allow Homer to add or remove knowledge from the
episodic memory.
7. Homer is an interesting architecture implementing several interesting ideas, from natural language
processing to planning and reasoning. One issue found in Homer is that when the episodic memory grows large,
it tends to slow down the overall operation of the agent.

Figure 3.9 Homer Architecture


12. BB1 (BLACKBOARD)
1. BB1 is a domain- independent blackboard architecture for AI systems created by Barbara Hayes- Roth. The
architecture supports control over problem solving as well as explaining its actions. The architecture is also able
to learn new domain knowledge.
2. BB1 includes two blackboards; a domain blackboard which acts as the global database and a control
blackboard, which is used for generating a solution to the given control problem.
3. The key behind BB1 is its ability to incrementally plan. Instead of defining a complete plan for a given goal,
and then executing that plan, BB1 dynamically develops the plan and adapts to the changes in the environment.
This is key for dynamic environments, where unanticipated changes can lead to brittle plans that eventually fail.
13. PROCEDURAL REASONING SYSTEM (BDI)
1. The Procedural Reasoning System (PRS) is a general-purpose architecture that’s ideal for reasoning
environments where actions can be defined by predetermined procedures (action sequences).
2. PRS is also a BDI architecture, mimicking the theory on human reasoning. PRS integrates both reactive and
goal-directed deliberative processing in a distributed architecture.
3. The architecture can build a world-model of the environment (beliefs) through interacting with environment
sensors.
4. Actions can also be taken through an intention’s module. At the core is an interpreter (or reasoner) which
selects a goal to meet (given the current set of beliefs) and then retrieves a plan to execute to achieve that
goal. PRS iteratively tests the assumptions of the plan during its execution. This means that it can operate in
dynamic environments where classical planners are doomed to fail.
5. Plans in PRS (also called knowledge areas) are predefined for the actions that are possible in the
environment. This simplifies the architecture because it isn’t required to generate plans, only select them based
on the environment and the goals that must be met.
6. While planning is more about selection than search or generation, the interpreter ensures that changes to
the environment do not result in inconsistencies in the plan. Instead, a new plan is selected to achieve the
specific goals.

8
7. PRS is a useful architecture when all necessary operations can be predefined. It’s also very efficient due to
lack of plan generation. This makes PRS an ideal agent architecture for building agents such as those to control
mobile robots.
13. AGLETS (MOBILE)
1. Aglets is a mobile agent framework designed by IBM Tokyo in the 1990s. Aglets is based on the Java
programming language, as it is well suited for a mobile agent’s framework. First, the applications are portable
to any system (both homogeneous and heterogeneous) that can run a Java Virtual Machine (JVM). Second, a
JVM is an ideal platform for migration services.
2. Java supports serialization, which is the aggregation of a Java application’s program and data into a single
object that is restart able.
3. In this case, the Java application is restarted on a new JVM. Java also provides a secure environment
(sandbox) to ensure that a mobile agent framework doesn’t become a virus distribution system. The Aglets
framework is shown in Figure 3.9. At the bottom of the framework is the JVM (the virtual machine that
interprets the Java byte codes). The agent runtime environment and mobility protocol are next. The mobility
protocol, called Aglet Transport Protocol (or ATP), provides the means to serialize agents and then transport
them to a host previously defined by the agent.
4. The agent API is at the top of the stack, which in usual Java fashion, provides several API classes that focus
on agent operation. Finally, there are the various agents that operate on the framework.
5. The agent API and runtime environment provide several services that are central to a mobile agent
framework. Some of the more important functions are agent management, communication, and security.
Agents must be able to register themselves on a given host to enable communication from outside agents.
6. In order to support communication, security features must be implemented to ensure that the agent has
the authority to execute on the framework.
7. Aglets provides several necessary characteristics for a mobile agent framework, including mobility,
communication, security, and confidentiality. Aglets provide weak migration, in that the agents can only
migrate at arbitrary points within the code (such as with the dispatch method).
14. MESSENGERS (MOBILE)
1. Messengers is a runtime environment that provides a form of process migration (mobile agency).
2. One distinct strength of the messenger’s environment is that it supports strong migration, or the ability to
migrate at arbitrary points within the mobile application.
3. The messengers environment provides the hop statement which defines when and where to migrate to a
new destination.
4. After migration is complete, the messenger’s agent restarts in the application at the point after the previous
hop statement. The result is that the application moves to the data, rather than using a messaging protocol to
move the data to the agent.
5. There are obvious advantages to this when the data set is large, and the migration links are slow. The
messengers model provides what the authors call Navigational Programming and Distributed Sequential
Computing (DSC).
6. What makes these concepts interesting is that they support the common model of programming that is
identical to the traditional flow of sequential programs. This makes them easier to develop and understand.
7. Let’s now look at an example of DSC using the messenger’s environment. Listing 11.5 provides a simple
program. Consider an application where on a series of hosts, we manipulate large matrices which are held in
their memory.
15. SOAR (HYBRID)
1. Soar, which originally was an acronym for State-Operator-And-Result, is a symbolic cognitive architecture.
2. Soar provides a model of cognition along with an implementation of that model for building general-purpose
AI systems.
3. The idea behind Soar is from Newell’s unified theories of cognition. Soar is one of the most widely used
architectures, from research into aspects of human behaviour to the design of game agents for first person-
shooter games.
4. The goal of the Soar architecture is to build systems that embody general intelligence. While Soar includes
many elements that support this goal (for example, representing knowledge using procedural, episodic, and
declarative forms), but Soar lacks some important aspects. These include episodic memories and also a model
for emotion. Soar’s underlying problem-solving mechanism is based on a production system (expert system).

9
5. Behaviour is encoded in rules like the if-then form. Solving problems in Soar can be most simply described
as problem space search (to a goal node). If this model of problem solving fails, other methods are used, such
as hill climbing.
6. When a solution is found, Soar uses a method called chunking to learn a new rule based on this discovery. If
the agent encounters the problem again, it can use the rule to select the action to take instead of performing
problem solving again.

3.3 AGENT COMMUNICATION


In the domain of multi-agent systems, communication is an important characteristic to support both
coordination and the transfer of information. Agents also require the ability to communicate actions or plans.
But how the communication takes place is a function of its purpose.
1. Agents communicate to achieve better the goals of themselves or of the society/system in which they exist.
2. Communication can enable the agents to coordinate their actions and behaviour, resulting in systems that
are more coherent.
3. Coordination is a property of a system of agents performing some activity in a shared
environment.
4. The degree of coordination is the extent to which they avoid extraneous activity by reducing resource
contention, avoiding live lock and deadlock, and maintaining applicable safety conditions.
5. Cooperation is coordination among non-antagonistic agents, while negotiation is coordination among
competitive or simply self-interested agents.
6. Typically, to cooperate successfully, each agent must maintain a model of the other agents, and also develop
a model of future interactions. This presupposes sociability Coherence is how well a system behaves as a unit.
A problem for a multiagent system is how it can maintain global coherence without explicit global control. In
this case, the agents must be able on their own to determine goals they share with other agents, determine
common tasks, avoid unnecessary conflicts, and pool knowledge and evidence. It is helpful if there is some form
of organization among the agents.
3.3.1 Dimensions of Meaning
There are three aspects to the formal study of communication: syntax (how the symbols of communication are
structured), semantics (what the symbols denote), and pragmatics (how the symbols are interpreted). Meaning
is a combination of semantics and pragmatics. Agents communicate to understand and be understood, so it is
important to consider the different dimensions of meaning that are associated with communication.
1. Descriptive vs. Prescriptive. Some messages describe phenomena, while others prescribe behaviour.
Descriptions are important for human comprehension but are difficult for agents to mimic. Appropriately, then,
most agent communication languages are designed for the exchange of information about activities and
behaviour.
2. Personal vs. Conventional Meaning. An agent might have its own meaning for a message, but this might
differ from the meaning conventionally accepted by the other agents with which the agent communicates. To
the greatest extent possible, multiagent systems should opt for conventional meanings, especially since these
systems are typically open environments in which new agents might be introduced at any time.
3. Subjective vs. Objective Meaning Similar to conventional meaning, where meaning is determined external
to an agent, a message often has an explicit effect on the environment, which can be perceived objectively. The
effect might be different than that understood internally, i.e., subjectively, by the sender or receiver of the
message.
4. Speaker's vs. Hearer's vs. Society's Perspective Independent of the conventional or objective meaning of a
message, the message can be expressed according to the viewpoint of the speaker or hearer or other observers.
5. Semantics vs. Pragmatics The pragmatics of a communication are concerned with how the communicators
use the communication. This includes considerations of the mental states of the communicators and the
environment in which they exist, considerations that are external to the syntax and semantics of the
communication.
6. Contextuality Messages cannot be understood in isolation but must be interpreted in terms of the mental
states of the agents, the present state of the environment, and the environment's history: how it arrived at its
present state. Interpretations are directly affected by previous messages and actions of the agents.
7. Coverage Smaller languages are more manageable, but they must be large enough so that an agent can
convey the meanings it intends.

10
8. Identity When a communication occurs among agents, its meaning is dependent on the identities and roles
of the agents involved, and on how the involved agents are specified.
A message might be sent to a particular agent, or to just any agent satisfying a specified criterion.
9. Cardinality A message sent privately to one agent would be understood differently than the same message
broadcast publicly.

3.4 NEGOTIATION
1. A frequent form of interaction that occurs among agents with different goals is termed negotiation.
2. Negotiation is a process by which a joint decision is reached by two or more agents, each trying to reach an
individual goal or objective. The agents first communicate their positions, which might conflict, and then try to
move towards agreement by making concessions or searching for alternatives.
3. The major features of negotiation are (1) the language used by the participating agents, (2) the protocol
followed by the agents as they negotiate, and (3) the decision process that each agent uses to determine its
positions, concessions, and criteria for agreement.
4. Many groups have developed systems and techniques for negotiation. These can be either environment-
centered or agent-centered. Developers of environment-centered techniques focus on the following problem:
"How can the rules of the environment be designed so that the agents in it, regardless of their origin,
capabilities, or intentions, will interact productively and fairly?"
The resultant negotiation mechanism should ideally have the following attributes:
• Efficiency: the agents should not waste resources in coming to an agreement.
• Stability: no agent should have an incentive to deviate from agreed- upon strategies.
• Simplicity: the negotiation mechanism should impose low computational, and bandwidth demands on
the agents.
• Distribution: the mechanism should not require a central decision maker.
• Symmetry: the mechanism should not be biased against any agent for arbitrary or inappropriate
reasons.
5. An articulate and entertaining treatment of these concepts is found in [36]. In particular, three types of
environments have been identified: worth-oriented domains, state-oriented domains, and task-oriented
domains.
6. A task-oriented domain is one where agents have a set of tasks to achieve, all resources needed to achieve
the tasks are available, and the agents can achieve the tasks without help or interference from each other.
However, the agents can benefit by sharing some of the tasks. An example is the "Internet downloading
domain," where each agent is given a list of documents that it must access over the Internet. There is a cost
associated with downloading, which each agent would like to minimize. If a document is common to several
agents, then they can save downloading cost by accessing the document once and then sharing it.
7. The environment might provide the following simple negotiation mechanism and
constraints:
(1) each agent declares the documents it wants
(2) documents found to be common to two or more agents are assigned to agents based on the toss of a coin,
(3) agents pay for the documents they download, and
(4) agents are granted access to the documents they download. as well as any in their common sets. This
mechanism is simple, symmetric, distributed, and efficient (no document is downloaded twice). To determine
stability, the agents' strategies must be considered.
8. An optimal strategy is for an agent to declare the true set of documents that it needs, regardless of what
strategy the other agents adopt or the documents they need. Because there is no incentive for an agent to
diverge from this strategy, it is stable.
9. For the first approach, speech-act classifiers together with a possible world semantics are used to formalize
negotiation protocols and their components. This clarifies the conditions of satisfaction for different kinds of
messages. To provide a flavor of this approach, we show in the following example how the commitments that
an agent might make as part of a negotiation are formalized [21]:

11
10. This rule states that an agent forms and maintains its commitment to achieve ø individually iff (1) it has not
precommitted itself to another agent to adopt and achieve ø, (2) it has a goal to achieve ø individually, and (3)
it is willing to achieve ø individually. The chapter on "Formal Methods in DAI" provides more information on
such descriptions.
11. The second approach assumes that the agents are economically rational. Further, the set of agents must
be small, they must have a common language and common problem abstraction, and they must reach a
common solution. Under these assumptions, Rosenschein and Zlotkin [37] developed a unified negotiation
protocol.
Agents that follow this protocol create a deal, that is, a joint plan between the agents that would satisfy all of
their goals. The utility of a deal for an agent is the amount he is willing to pay minus the cost of the deal. Each
agent wants to maximize its own utility.
The agents discuss a negotiation set, which is the set of all deals that have a positive utility for
every agent.
In formal terms, a task-oriented domain under this approach becomes a tuple <T, A, c> where T is the set of
tasks, A is the set of agents, and c(X) is a monotonic function for the cost of executing the tasks X. A deal is a
redistribution of tasks. The utility of deal d for agent k is Uk(d) = c(Tk) - c(dk)
The conflict deal D occurs when the agents cannot reach a deal. A deal d is individually rational
if d > D. Deal d is pareto optimal if there is no deal d' > d. The set of all deals that are individually rational and
pareto optimal is the negotiation set, NS. There are three possible situations:
1. conflict: the negotiation set is empty
2. compromise: agents prefer to be alone, but since they are not, they will agree to a negotiated deal
3. cooperative: all deals in the negotiation set are preferred by both agents overachieving their goals alone.
When there is a conflict, then the agents will not benefit by negotiating—they are better off acting alone.
Alternatively, they can "flip a coin" to decide which agent gets to satisfy its goals.
Negotiation is the best alternative in the other two cases.
Since the agents have some execution autonomy, they can in principle deceive or mislead each other.
Therefore, an interesting research problem is to develop protocols or societies in which the effects of deception
and misinformation can be constrained. Another aspect of the research problem is to develop protocols under
which it is rational for agents to be honest with each other. The connections of the economic approaches with
human-oriented negotiation and argumentation have not yet been fully worked out.

3.5 BARGAINING
Link: http://www.cse.iitd.ernet.in/~rahul/cs905/lecture15/index.html (Refer this link for easy understanding) A
bargaining problem is defined as a pair (S,d). A bargaining solution is a function f that maps every bargaining
problem (S,d) to an outcome in S, i.e., f : (S,d) → S
Thus the solution to a bargaining problem is a pair in R2. It gives the values of the game to the two players and
is generated through the function called bargaining function.
Bargaining function maps the set of possible outcomes to the set of acceptable ones.

Bargaining Solution
• In a transaction when the seller and the buyer value a product differently, a surplus is created.
• A bargaining solution is then a way in which buyers and sellers agree to divide the surplus.
• For example, consider a house made by a builder A. It costed him Rs.10 Lacs. A potential buyer is
interested in the house and values it at Rs.20 Lacs. This transaction can generate a surplus of Rs.10
Lacs. The builder and the buyer now need to trade at a price. The buyer knows that the cost is less than
20 Lacs and the seller knows that the value is greater than 10 Lacs. The two of them need to agree at
a price. Both try to maximize their surplus. Buyer would want to buy it for 10 Lacs, while the seller
would like to sell it for 20 Lacs. They bargain on the price, and either trade or dismiss.
• Trade would result in the generation of surplus, whereas no surplus is created in case of no-trade.
Bargaining Solution provides an acceptable way to divide the surplus among the two parties.
• Formally, a Bargaining Solution is defined as, F : (X,d) → S, where X R2 and S,d R2. X represents the
utilities of the players in the set of possible bargaining agreements. d represents the point of
disagreement. In the above example, price [10,20], bargaining set is simply x + y 10, x 0, y 0. A
point (x,y) in the bargaining set represents the case, when seller gets a surplus of x, and buyer gets a
surplus of y, i.e. seller sells the house at 10 + x and the buyer pays 20 - y.

12
1. the set of payoff allocations that are jointly feasible for the two players in the process of negotiation or
arbitration, and
2. the payoffs they would expect if negotiation or arbitration were to fail to reach a settlement.
Based on these assumptions, Nash generated a list of axioms that a reasonable solution ought to satisfy. These
axioms are as follows:
Axiom 1 (Individual Rationality) This axiom asserts that the bargaining solution should give neither player less
than what it would get from disagree ment, i.e., f(S,d) = d.
Axiom 2 (Symmetry) As per this axiom, the solution should be independent of the names of the players, i.e.,
who is named a and who is named b. This means that when the players’ utility functions and their disagreement
utilities are the same, they receive equal shares. So any symmetries in the final payoff should only be due to
the differences in their utility functions or their disagreement outcomes.
Axiom 3 (Strong Efficiency) This axiom asserts that the bargaining solution should be feasible and Pareto
optimal.
Axiom 4 (Invariance) According to this axiom, the solution should not change as a result of linear changes to
the utility of either player. So, for example, if a player’s utility function is multiplied by 2, this should not change
the solution. Only the player will value what it gets twice as much.
Axiom 5 (Independence of Irrelevant Alternatives) This axiom asserts that eliminating feasible alternatives
(other than the disagreement point) that would not have been chosen should not affect the solution, i.e., for
any closed convex set N ash proved that the bargaining solution that satisfies the above five axioms is given by

GAME-THEORETIC APPROACHES FOR MULTI-ISSUE NEGOTIATION


The following are the four key procedures for bargaining over multiple issue:
1. Global bargaining: Here, the bargaining agents directly tackle the global problem in which all the issues are
addressed at once. In the context of non-cooperative theory, the global bargaining procedure is also called the
package deal procedure. In this procedure, an offer from one agent to the other would specify how each one
of the issues is to be resolved.
2. Independent/separate bargaining: Here negotiations over the individual issues are totally separate and
independent, with each having no effect on the other. This would be the case if each of the two parties
employed m agents (for negotiating over m issues), with each agent in charge of negotiating one issue. For
example, in negotiations between two countries, each issue may be resolved by representatives from the
countries who care only about their individual issue.
3. Sequential bargaining with independent implementation: Here the two parties consider one issue at a time.
For instance, they may negotiate over the first issue, and after reaching an agreement on it, move on to
negotiate the second, and so on. Here, the parties may not negotiate an issue until the previous one is resolved.
There are several forms of the sequential procedure. These are defined in terms of the agenda and the
implementation rule. For sequential bargaining, the agenda3 specifies the order in which the issues will be
bargained. The implementation rule specifies when an agreement on an individual issue goes into effect. There
are two implementation rules: the rule of independent implementation and the rule of simultaneous
implementation.
4. Sequential bargaining with simultaneous implementation: This is like the previous case except that now an
agreement on an issue does not take effect until an agreement is reached on all the subsequent issues.
Cooperative Models of Multi-Issue Negotiation
1. Simultaneous implementation agenda independence: This axiom states that global bargaining and
sequential bargaining with simultaneous implementation yield the same agreement.
2. Independent implementation agenda independence: This axiom states that global bargaining and sequential
bargaining with independent implementation yield the same agreement.
3. Separate/global equivalence: This axiom states that global bargaining and separate bargaining yield the same
agreement.

Non-Cooperative Models of Multi-Issue Negotiation


An agent’s cumulative utility is linear and additive. The functions Ua and Ub give the cumulative utilities for a
and b respectively at time t and are defined as follows.

13
3.6 ARGUMENTATION

➢ “A verbal and social activity of reason aimed at increasing (or decreasing) the acceptability of a controversial
standpoint for the listener or reader, by putting forward a constellation of propositions (i.e. arguments)
intended to justify (or refute) the standpoint before a rational judge”
➢ Argumentation can be defined as an activity aimed at convincing of the acceptability of a standpoint by
putting forward propositions justifying or refuting the standpoint.
➢ Argument: Reasons / justifications supporting a conclusion
➢ Represented as: support ->conclusion
– Informational arguments: Beliefs -> Belief e.g. If it is cloudy, it might rain.
– Motivational args: Beliefs, Desires ->Desire e.g. If it is cloudy and you want to get out, then you don’t want
to get wet.
– Practical arguments: Belief, Sub-Goals -> Goal e.g. If it is cloudy and you own a raincoat, then put the raincoat.
– Social arguments: Social commitments-> Goal, Desire e.g. I will stop at the corner because the law says so.
e.g I can’t do that, I promise to my mother that won’t.
Process of Argumentation
1. Constructing arguments (in favor of / against a “statement”) from available information.
A: “Tweety is a bird, so it flies”
B: “Tweety is just a cartoon!”
2. Determining the different conflicts among the arguments.
“Since Tweety is a cartoon, it cannot fly!” (B attacks A)
Evaluating the acceptability of the different arguments
“Since we have no reason to believe otherwise, we’ll assume Tweety is a cartoon.”
(Accept B). “But then, this means despite being a bird he cannot fly.” (Reject A).
3. Concluding or defining the justified conclusions.
“We conclude that Tweety cannot fly!”
Computational Models of Argumentation:
1. Given the definition of arguments over a content language (and its logic), the models
allow to:
• Compute interactions between arguments: attacks, defeat, support, ...
• Valuation of arguments: assign weights to arguments to compare them.
Intrinsic value of an argument Interaction-based value of an argument
2. Selection of acceptable argument (conclusion)
• Individual acceptability
• Collective acceptability

14
3.7 TRUST & REPUTATION IN MULTI AGENT SYSTEMS--
It depends on the level we apply it:
1. User confidence
• Can we trust the user behind the agent?
– Is he/she a trustworthy source of some kind of knowledge? (e.g. an expert in a field)
– Does he/she act in the agent system (through his agents in a trustworthy way?
2. Trust of users in agents
• Issues of autonomy: the more autonomy, less trust
• How to create trust?
– Reliability testing for agents
– Formal methods for open MAS
– Security and verifiability
3. Trust of agents in agents
• Reputation mechanisms
• Contracts
• Norms and Social Structures
What is Trust?
1. In closed environments, cooperation among agents is included as part of the designing process.
2. The multi- agent system is usually built by a single developer or a single team of developers and the chosen
developers, option to reduce complexity is to ensure cooperation among the agents they build including it as
an important system requirement.
3. Benevolence assumption: an agent AI requesting information or a certain service from agent aj can be sure
that such agent will answer him if AI has the capabilities and the resources needed, otherwise aj will inform AI
that it cannot perform the action requested.
4. It can be said that in closed environments trust is implicit.
Trust can be computed as
1. A binary value (1= ‘I do trust this agent’, 0=‘I don’t trust this agent’)
2. A set of qualitative values or a discrete set of numerical values (e g ‘trust always’ ‘trust
conditional to X’ ‘no trust’) e.g. always, X, trust) (e.g. ‘2’, ‘1’, ‘0’, ‘-1’, ‘-2’)
3. A continuous numerical value (e.g. [-300.300])
4. A probability distribution
5. Degrees over underlying beliefs and intentions (cognitive approach)
HOW TO COMPUTE TRUST
1. Trust values can be externally defined
• by the system designer: the trust values are pre-defined
• by the human user: he can introduce his trust values about the humans behind the other
agents
2. Trust values can be inferred from some existing representation about the interrelations
between the agents
• Communication patterns, cooperation history logs, e-mails, webpage connectivity
mapping...
3. Trust values can be learnt from current and past experiences
• Increase trust value for agent AI if behaves properly with us
• Decrease trust value for agent AI if it fails/defects us
4. Trust values can be propagated or shared through a MAS
• Recommender systems, Reputation mechanisms.
TRUST AND REPUTATION
1. Most authors in literature make a mix between trust and reputation
2. Some authors make a distinction between them
3. Trust is an individual measure of confidence that a given agent has over other agent(s)
4. Reputation is a social measure of confidence that a group of agents or a society has over
agents or groups. Reputation is one mechanism to compute (individual) Trust.
• I will trust more an agent that has good reputation
15
• My reputation clearly affects the amount of trust that others have towards me.
• Reputation can have a sanctioning role in social groups: a bad reputation can be very
costly to one’s future transactions.
5. Most authors combine (individual) Trust with some form of (social) Reputation in their models
6. Recommender systems, Reputation mechanisms.

Figure 3.10 TRUST AND REPUTATION


Direct experiences are the most relevant and reliable information source for individual.
trust/reputation
1. Type 1: Experience based on direct interaction with the 2. Type 1: Experience with the partner
1. Used by almost all models
2. How to:
• trust value about that partner increases with good experiences,
• it decreases with bad ones
3. Problem: how to compute trust if there is no previous interaction?
3. Type 2: Experience based on observed interaction of other members
1. Used only in scenarios prepared for this.
2. How to: depends on what an agent can observe
• agents can access to the log of past interactions of other agents
• agents can access some feedback from agents about their past interactions (e.g., in eBay)
3. Problem: one must introduce some noise handling or
4. confidence level on this information
4. Prior-derived: agents bring with them prior beliefs about strangers
Used by some models to initialize trust/reputation values.
How-to:
• designer or human user assigns prior values
• a uniform distribution for reputation priors is set
Give new agents the lowest possible reputation value there is no incentive to throw away a cyber-identity when
an agent’s reputation falls below a starting point.
• Assume neither good nor bad reputation for unknown agents.
• Avoid lowest reputation for new, valid agents as an obstacle for other agents to realize that they are
valid.

16
5. Group-derived:
• Models for groups can been extended to provide prior reputation estimates for agents in social
groups.
• Mapping between the initial individual reputation of a stranger and the group from which he or she
comes from.
• Problem: highly domain-dependent and model-dependent.
6. Propagated:
• Agent can attempt to estimate the stranger’s reputation based on information garnered from others
in the environment Also called word of mouth.
• Problem: The combination of the different reputation values tends to be an ad-hoc solution with no
social basis
TRUST AND REPUTATION MODELS
1. Not really for MAS, but can be applied to MAS
2. Idea: For serious life / business decisions, you want the
• opinion of a trusted expert
3. If an expert not personally known, then want to find a reference to one via a chain of friends and colleagues
4. Referral-chain provides:
• Way to judge quality of expert's advice
• Reason for the expert to respond in a trustworthy manner
• Finding good referral-chains is slow, time-consuming, but vital business gurus on “networking”
• Set of all possible referral-chains = a social network
5. Model integrates information from
• Official organizational charts (online)
• Personal web pages (+ crawling)
• External publication databases
• Internal technical document databases
6. Builds a social network based in referral chains
• Each node is a recommender agent
• Each node provides reputation values for specific areas
o E.g. Frieze is good in mathematics
• Searches in the referral network are made by areas
o E.g. browsing the network’s “mathematics” recommendation chains
7. Trust Model Overview
• 1-to-1 asymmetric trust relationships.
• Direct trust and recommender trust.
• Trust categories and trust values [-1,0,1,2,3,4].
8. Conditional transitivity.
Alice trusts Bob.&. Bob trusts Cathy
Alice trusts Cathy
Alice trusts. rec Bob.&. Bob says Bob trusts Cathy
Alice may trust Cathy
Alice trusts.rec Bob value X. &. Bob says Bob trusts Cathy value Y
Alice may trust Cathy value f(X, Y)
9. Recommendation protocol
1. Alice ->Bob: RRQ(Eric)
2. Bob ->Cathy: RRQ(Eric)
3. Cathy -> Bob: Rec(Eric,3)
4. Bob ->Alice: Rec(Eric,3)

17
Figure 3.11 Procedure
12. Direct Trust:
1. ReGreT assumes that there is no difference between direct interaction and direct.
2. observation in terms of reliability of the information. It talks about direct experiences.
3. The basic element to calculate a direct trust is the outcome.
4. An outcome of a dialog between two agents can be either:
• An initial contract to take a particular course of action and the actual result of the actions
taken, or
• An initial contract to x the terms and conditions of a transaction and the actual values
of the terms of the transaction.
13. Reputation Model: Witness reputation
a. First step to calculate a witness reputation is to identify the set of witnesses that will be considered
by the agent to perform the calculation.
b. The initial set of potential witnesses might be
i. the set of all agents that have interacted with the target agent in the past.
ii. This set, however, can be very big and the information provided by its members probably
suffer from the correlated evidence problem.
c. Next step is to aggregate these values to obtain a single value for the witness reputation.

18
The importance of each piece of information in the final reputation value will be proportional to the witness
credibility.
14. Reputation Model: Witness reputation
a. Two methods to evaluate witness credibility:
i. ReGreT uses fuzzy rules to calculate how the structure of social relations influences the
credibility on the information. The antecedent of each rule is the type and degree of a social
relation (the edges in a sociogram) and the consequent is the credibility of the witness from
the point of view of that social relation.
ii The second method used in the ReGreT system to calculate the credibility of a witness is to
evaluate the accuracy of previous pieces of information sent by that witness to the agent. The
agent is using the direct trust value to measure the truthfulness of the information received
from witnesses.
15. Reputation Model: Neighbourhood Reputation
a. Neighbourhood in a MAS is not related with the physical location of the agents but with the links
created through interaction.
b. The main idea is that the behaviour of these neighbours and the kind of relation they have with the
target agent can give some clues about the behaviour of the target agent.
c. To calculate a Neighbourhood Reputation the ReGreT system uses fuzzy rules.
i. The antecedents of these rules are one or several direct trusts associated to different
behavioural aspects and the relation between the target agent and the neighbour.
ii. The consequent is the value for a concrete reputation (that can be associated to the same
behavioural aspect of the trust values or not).
16. Reputation Model: System Reputation
a. To use the common knowledge about social groups and the role that the agent is playing in the
society as a mechanism to assign default reputations to the agents.
b. ReGreT assumes that the members of these groups have one or several observable features that
unambiguously identify their membership.
c. Each time an agent performs an action we consider that it is playing a single role.
i. E.g. an agent can play the role of buyer and seller but when it is selling a product only the role of
seller is relevant.
17. System reputations are calculated using a table for each social group where the rows are the roles the
agent can play for that group, and the columns the behavioural aspects.
18. Reputation Model: Default Reputation
a. To the previous reputation types, we must add a fourth one, the reputation assigned to a third-party
agent when there is no information at all: the default reputation.
b. Usually this will be a fixed value.
19. Reputation Model: Combining reputations
a. Each reputation type has different characteristics and there are a lot of heuristics that can be used to
aggregate the four reputation values to obtain a single and representative.
reputation value.
b. In ReGreT this heuristic is based on the default and calculated reliability assigned to each type.
c. Assuming we have enough information to calculate all the reputation types, we have the stance that
a. witness reputation is the first type that should be considered, followed by
b. the neighbourhood reputation,
c. system reputation
d. the default reputation.
20. Main criticism to Trust and Reputation research:
a. Proliferation of ad-hoc models weakly grounded in social theory.
b. No general, cross-domain model for reputation
c. Lack of integration between models
i. Comparison between models unfeasible
ii. Researchers are trying to solve this by, e.g. the ART competition

19
3.8 LANGUAGE MODELS
• Language can be defined as a set of strings;
• “print(2+2)” is a legal program in the language Python, where “2) + (2 print” is not. Since they are an
infinite number of legal programs, they cannot be enumerated; instead, they are specified by a set of
rules called a grammar. Formal languages also have rules that defined the meaning semantics of a
program.
• for example, the rules say that the “meaning” of “2 + 2” is 4, and the meaning of “1/0” is that an error
is signated.
1. Natural languages, such an English or Spanish, cannot be characterized as a definite set of sentences.
Example: Everyone agrees that “Not to be invited is sad” is a sentence of English, but people disagree on the
grammatically of “To be not invited is said”.
Therefore, it is more fruitful to define a natural language model as a probability distribution over sentences
rather than a definitive set. That is, rather than asking if a string of words is or is not a member of the set
defining the language, we instead ask for P(S = word) - what is the probability that a random sentence would
to words.
Natural languages are also ambiguous. “He saw her duck” can mean either that he saw a waterfowl belonging
to her, or that he saw her move to evade something. Thus, again, we cannot speak of a single meaning for a
sentence, but rather of a probability distribution over possible meaning.
2. Finally, natural language are difficult to deal with because they are very large, and constantly changing. Thus,
our language models are, at best, an approximation. We start with simplest possible approximation and move
up from there.

Language modelling (LM) is the use of various statistical and probabilistic techniques to determine the
probability of a given sequence of words occurring in a sentence. Language models analyze bodies of text data
to provide a basis for their word predictions. They are used in natural language processing (NLP) applications,
particularly ones that generate text as an output. Some of these applications include, machine translation and
question answering.

How language modelling works


Language models determine word probability by analyzing text data. They interpret this data by feeding it
through an algorithm that establishes rules for context in natural language. Then, the model applies these rules
in language tasks to accurately predict or produce new sentences. The model essentially learns the features
and characteristics of basic language and uses those features to understand new phrases.

There are several different probabilistic approaches to modeling language, which vary depending on the
purpose of the language model. From a technical perspective, the various types differ by the amount of text
data they analyze and the math they use to analyze it. For example, a language model designed to generate
sentences for an automated Twitter bot may use different math and analyze text data in a different way than
a language model designed for determining the likelihood of a search query.

Some common statistical language modeling types are:

• N-gram. N-grams are a relatively simple approach to language models. They create a probability
distribution for a sequence of n The n can be any number, and defines the size of the "gram", or
sequence of words being assigned a probability. For example, if n = 5, a gram might look like this:
"can you please call me." The model then assigns probabilities using sequences of n size.
Basically, n can be thought of as the amount of context the model is told to consider. Some types
of n-grams are unigrams, bigrams, trigrams and so on.
N-gram Language Model:
An N-gram language model predicts the probability of a given N-gram within any sequence of words in the
language. A good N-gram model can predict the next word in the sentence i.e the value of p(w|h)

20
Example of N-gram such as unigram (“This”, “article”, “is”, “on”, “NLP”) or bi-gram (‘This article’, ‘article is’,
‘is on’,’on NLP’).
Now, we will establish a relation on how to find the next word in the sentence using
We need to calculate p(w|h), where is the candidate for the next word. For example in the above example,
lets’ consider, we want to calculate what is the probability of the last word being “NLP” given the previous
words:

After generalizing the above equation can be calculated as:

But how do we calculate it? The answer lies in the chain rule of probability:

Now generalize the above equation:

Simplifying the above formula using Markov assumptions:

• For unigram:

• For Bigram:

SMOOTHING N-GRAM MODELS


1. The major complication of n-gram models is that the training corpus provides only an estimate of the true
probability distribution.
2. For common character sequence such as “th” any English corpus will give a good estimate: about 1.5% of all
trigrams.
3. On the other hand, “ht” is very uncommon – no dictionary words start5 with ht. It is likely that the sequence
would.
4. Have a count of zero in a training corpus of standard English. Does that mean we should assign P(“th”) = 0?
If we did, then the text “The program issues an http request” would have an English probability of zero, which
seems wrong.
5. The process adjusting the probability of low-frequency counts is called smoothing.
6. A better approach is a backoff model, in which we start by estimating n-gram counts, but for any sequence
that has a low (or zero) count, we back off to (n-1) grams. Linear interpolation smoothing is a backoff model
that combines trigram, and unigram models by linear interpolation. It defines the probability estimate as
* P(ci|ci-2:i=1) = 3P(ci|ci-2:i-1) + 2P(ci|ci-1) + 1P(ci), where 3 + 2 + 1 = 1.
It is also possible to have the values of I depend on the counts: if we have a high count of trigrams, then we
weigh them relatively more; if only a low count, then we put more weight on the bigram and unigram models.
• Unigram. The unigram is the simplest type of language model. It doesn't look at any conditioning
context in its calculations. It evaluates each word or term independently. Unigram models
commonly handle language processing tasks such as information retrieval. The unigram is the
foundation of a more specific model variant called the query likelihood model, which uses

21
information retrieval to examine a pool of documents and match the most relevant one to a specific
query.
• Bidirectional. Unlike n-gram models, which analyze text in one direction (backwards), bidirectional
models analyze text in both directions, backwards and forwards. These models can predict any
word in a sentence or body of text by using every other word in the text. Examining text
bidirectionally increases result accuracy. This type is often utilized in machine learning and speech
generation applications. For example, Google uses a bidirectional model to process search queries.
• Exponential. Also known as maximum entropy models, this type is more complex than n-grams.
Simply put, the model evaluates text using an equation that combines feature functions and n-
grams. Basically, this type specifies features and parameters of the desired results, and unlike n-
grams, leaves analysis parameters more ambiguous -- it doesn't specify individual gram sizes, for
example. The model is based on the principle of entropy, which states that the probability
distribution with the most entropy is the best choice. In other words, the model with the most
chaos, and least room for assumptions, is the most accurate. Exponential models are designed
maximize cross entropy, which minimizes the amount statistical assumptions that can be made.
This enables users to better trust the results they get from these models.
• Continuous space. This type of model represents words as a non-linear combination of weights in
a neural network. The process of assigning a weight to a word is also known as word embedding.
This type becomes especially useful as data sets get increasingly large, because larger datasets
often include more unique words. The presence of a lot of unique or rarely used words can cause
problems for linear model like an n-gram. This is because the amount of possible word sequences
increases, and the patterns that inform results become weaker. By weighting words in a non-linear,
distributed way, this model can "learn" to approximate words and therefore not be misled by any
unknown values. Its "understanding" of a given word is not as tightly tethered to the immediate
surrounding words as it is in n-gram models.

The models listed above are more general statistical approaches from which more specific variant language
models are derived. For example, as mentioned in the n-gram description, the query likelihood model is a more
specific or specialized model that uses the n-gram approach. Model types may be used in conjunction with one
another.

The models listed also vary significantly in complexity. Broadly speaking, more complex language models are
better at NLP tasks, because language itself is extremely complex and always evolving. Therefore, an
exponential model or continuous space model might be better than an n-gram for NLP tasks, because they are
designed to account for ambiguity and variation in language.

A good language model should also be able to process long-term dependencies, handling words that may derive
their meaning from other words that occur in far-away, disparate parts of the text. An LM should be able to
understand when a word is referencing another word from a long distance, as opposed to always relying on
proximal words within a certain fixed history. This requires a more complex model.

Importance of language modeling


Language modeling is crucial in modern NLP applications. It is the reason that machines can understand
qualitative information. Each language model type, in one way or another, turns qualitative information into
quantitative information. This allows people to communicate with machines as they do with each other to a
limited extent.

It is used directly in a variety of industries including tech, finance, healthcare, transportation, legal, military and
government. Additionally, it's likely most people reading this have interacted with a language model in some
way at some point in the day, whether it be through Google search, an autocomplete text function or engaging
with a voice assistant.

22
The roots of language modeling as it exists today can be traced back to 1948. That year, Claude Shannon
published a paper titled "A Mathematical Theory of Communication." In it, he detailed the use of a stochastic
model called the Markov chain to create a statistical model for the sequences of letters in English text. This
paper had a large impact on the telecommunications industry, laid the groundwork for information theory and
language modeling. The Markov model is still used today, and n-grams specifically are tied very closely to the
concept.

Uses and examples of language modeling


Language models are the backbone of natural language processing (NLP). Below are some NLP tasks that use
language modeling, what they mean, and some applications of those tasks:

• Speech recognition-- involves a machine being able to process speech audio. This is commonly used
by voice assistants like Siri and Alexa.
• Machine translation -- involves the translation of one language to another by a machine. Google
Translate and Microsoft Translator are two programs that do this. SDL Government is another,
which is used to translate foreign social media feeds in real time for the U.S. government.
• Parts-of-speech tagging -- involves the markup and categorization of words by certain grammatical
characteristics. This is utilized in the study of linguistics, first and perhaps most famously in the
study of the Brown Corpus, a body of composed of random English prose that was designed to be
studied by computers. This corpus has been used to train several important language models,
including one used by Google to improve search quality.
• Parsing -- involves analysis of any string of data or sentence that conforms to formal grammar and
syntax rules. In language modeling, this may take the form of sentence diagrams that depict each
word's relationship to the others. Spell checking applications use language modeling and parsing.
• Sentiment analysis -- involves determining the sentiment behind a given phrase. Specifically, it can
be used to understand opinions and attitudes expressed in a text. Businesses can use this to analyze
product reviews or general posts about their product, as well as analyze internal data like employee
surveys and customer support chats. Some services that provide sentiment analysis tools are
Repustate and Hubspot's ServiceHub. Google's NLP tool -- called Bidirectional Encoder
Representations from Transformers (BERT) -- is also used for sentiment analysis.
• Optical character recognition -- involves the use of a machine to convert images of text into
machine encoded text. The image may be a scanned document or document photo, or a photo
with text somewhere in it -- on a sign, for example. It is often used in data entry when processing
old paper records that need to be digitized. In can also be used to analyze and identify handwriting
samples.
• Information retrieval -- involves searching in a document for information, searching for documents
in general, and searching for metadata that corresponds to a document. Web browsers are the
most common information retrieval applications.
3.9 INFORMATION RETRIEVAL
Information Retrieval (IR) can be defined as a software program that deals with the organization, storage,
retrieval, and evaluation of information from document repositories, particularly textual information.
Information Retrieval is the activity of obtaining material that can usually be documented on an unstructured
nature i.e. usually text which satisfies an information need from within large collections which is stored on
computers. For example, Information Retrieval can be when a user enters a query into the system.
Not only librarians, professional searchers, etc engage themselves in the activity of information retrieval but
nowadays hundreds of millions of people engage in IR every day when they use web search engines.
Information Retrieval is believed to be the dominant form of Information access. The IR system assists the
users in finding the information they require but it does not explicitly return the answers to the question. It
notifies regarding the existence and location of documents that might consist of the required information.
Information retrieval also extends support to users in browsing or filtering document collection or processing

23
a set of retrieved documents. The system searches over billions of documents stored on millions of
computers. A spam filter, manual or automatic means are provided by Email program for classifying the mails
so that it can be placed directly into particular folders.
An IR system can represent, store, organize, and access information items. A set of keywords are required to
search. Keywords are what people are searching for in search engines. These keywords summarize the
description of the information.

What is an IR Model?

An Information Retrieval (IR) model selects and ranks the document that is required by the user or the user
has asked for in the form of a query. The documents and the queries are represented in a similar manner, so
that document selection and ranking can be formalized by a matching function that returns a retrieval status
value (RSV) for each document in the collection. Many of the Information Retrieval systems represent
document contents by a set of descriptors, called terms, belonging to a vocabulary V. An IR model determines
the query-document matching function according to four main approaches:
The estimation of the probability of user’s relevance rel for each document d and query q with respect to a
set R q of training documents: Prob (rel|d, q, Rq)

Types of IR Models

24
Components of Information Retrieval/ IR Model

• Acquisition: In this step, the selection of documents and other objects from various web
resources that consist of text-based documents takes place. The required data is collected by web
crawlers and stored in the database.
• Representation: It consists of indexing that contains free-text terms, controlled vocabulary,
manual & automatic techniques as well. example: Abstracting contains summarizing and
Bibliographic description that contains author, title, sources, data, and metadata.
• File Organization: There are two types of file organization methods. i.e. Sequential: It contains
documents by document data. Inverted: It contains term by term, list of records under each
term. Combination of both.
• Query: An IR process starts when a user enters a query into the system. Queries are formal
statements of information needs, for example, search strings in web search engines. In
information retrieval, a query does not uniquely identify a single object in the collection. Instead,
several objects may match the query, perhaps with different degrees of relevancy.

Difference Between Information Retrieval and Data Retrieval

Information Retrieval Data Retrieval


The software program that deals with the Data retrieval deals with obtaining data from a
organization, storage, retrieval, and evaluation of
database management system such as ODBMS. It is A
information from document repositories
process of identifying and retrieving the data from the
particularly textual information. database, based on the query provided by user or
application.
Retrieves information about a subject. Determines the keywords in the user query and
retrieves the data.
Small errors are likely to go unnoticed. A single error object means total failure.
Not always well structured and is semantically Has a well-defined structure and semantics.
ambiguous.
Does not provide a solution to the user of the Provides solutions to the user of the database system.
database system.
The results obtained are approximate matches. The results obtained are exact matches.
Results are ordered by relevance. Results are unordered by relevance.
It is a probabilistic model. It is a deterministic model.

25
User Interaction With Information Retrieval System

The User Task: The information first is supposed to be translated into a query by the user. In the information
retrieval system, there is a set of words that convey the semantics of the information that is required
whereas, in a data retrieval system, a query expression is used to convey the constraints which are satisfied
by the objects. Example: A user wants to search for something but ends up searching with another thing.
This means that the user is browsing and not searching. The above figure shows the interaction of the user
through different tasks.
• Logical View of the Documents: A long time ago, documents were represented through a set of
index terms or keywords. Nowadays, modern computers represent documents by a full set of
words which reduces the set of representative keywords. This can be done by eliminating
stopwords i.e. articles and connectives. These operations are text operations. These text
operations reduce the complexity of the document representation from full text to set of index
terms.

Past, Present, and Future of Information Retrieval

1. Early Developments: As there was an increase in the need for a lot of information, it became necessary to
build data structures to get faster access. The index is the data structure for faster retrieval of information.
Over centuries manual categorization of hierarchies was done for indexes.
2. Information Retrieval In Libraries: Libraries were the first to adopt IR systems for information retrieval. In
first-generation, it consisted, automation of previous technologies, and the search was based on author
name and title. In the second generation, it included searching by subject heading, keywords, etc. In the third
generation, it consisted of graphical interfaces, electronic forms, hypertext features, etc.
3. The Web and Digital Libraries: It is cheaper than various sources of information, it provides greater access
to networks due to digital communication and it gives free access to publish on a larger medium.

3.10 INFORMATION EXTRACTION


Information extraction is the process of acquiring INFORMATION knowledge by skimming a text and looking for
occurrences of a particular class of object and for relationships among objects. A typical task is to extract
instances of addresses from Web pages, with database fields for street, city, state, and zip code, or instances
of storms from weather reports, with fields for temperature, wind speed, and precipitation. In a limited domain,
this can be done with high accuracy.
FINITE-STATE AUTOMATA FOR INFORMATION EXTRACTION
1. The simplest type of information extraction system is an attribute-based extraction system that assumes
that the entire text refers to a single object and the task is to extract attributes of that object.
2. One step up from attribute-based extraction systems are relational extraction systems, which deal with
multiple objects and the relations among them.

26
3. A typical relational-based extraction system is FASTUS, which handles news stories about corporate mergers
and acquisitions.
4. A relational extraction system can be built as a series of cascaded finite-state transducers.
5. That is, the system consists of a series of small, efficient finite-state automata (FSAs), where each automaton
receives text as input, transduces the text into a different format, and passes it along to the next automaton.
FASTUS consists of five stages:
1. Tokenization
2. Complex-word handling
3. Basic-group handling
4. Complex-phrase handling
5. Structure merging
FASTUS’s first stage is tokenization, which segments the stream of characters into tokens (words, numbers, and
punctuation). For English, tokenization can be simple; just separating characters at white space or punctuation
does a fairly good job.
Some tokenizers also deal with markup languages such as HTML, SGML, and XML.
7. The second stage handles complex words, including collocations such as “set up” and “joint venture,” as well
as proper names such as “Bridgestone Sports Co.” These are recognized by a combination of lexical entries and
finite- state grammar rules.
8. The third stage handles basic groups, meaning noun groups and verb groups. The idea is to chunk these into
units that will be managed by the later stages.
9. The fourth stage combines the basic groups into complex phrases. Again, the aim is to have rules that are
finite- state and thus can be processed quickly, and that result in unambiguous (or nearly unambiguous) output
phrases. One type of combination rule deals with domain-specific events.
10. The final stage merges structures that were built up in the previous step. If the next sentence says “The
joint venture will start production in January,” then this step will notice that there are two references to a joint
venture, and that they should be merged into one. This is an instance of the identity uncertainty problem.
PROBABILISTIC MODELS FOR INFORMATION EXTRACTION
1. The simplest probabilistic model for sequences with hidden state is the hidden Markov
model, or HMM.
2. HMMs have two big advantages over FSAs for extraction.
• First, HMMs are probabilistic, and thus tolerant to noise.
• In a regular expression, if a single expected character is missing, the regex fails to match; with HMMs
there is graceful degradation with missing characters/words, and we get a probability indicating the
degree of match, not just a Boolean match/fail.
• Second, HMMs can be trained from data; they don’t require laborious engineering of templates, and
thus they can more easily be kept up to date as text changes over time.

Figure. Hidden Markov model for the speaker of a talk announcement. The two square states are the target
(note the second target state has a self-loop, so the target can match a strain of any length), the four circles to
the left are prefix, and the one of the rights is the postfix. For each state, only a few of the high-probability
words are shown. From Freitag and McCallum (2000)
3. Once the HMMs have been learned, we can apply them to a text, using the Viterbi algorithm to find the
most likely path through the HMM states. One approach is to apply each attribute HMM separately; in this case
you would expect most of the HMMs to spend most of their time in background states. This is appropriate

27
when the extraction is sparse - when the number of extracted words is small compared to the length of the
text.
4. The other approach is to combine all the individual attributes into one big HMM, which
would then find a path that wanders through different target attributes, first finding a speaker target, then a
date target, etc. Separate HMMs are better when we expect just one of each attribute in a text and one big
HMM is better when the texts are more free-form and dense with attributes.
5. HMMs have the advantage of supplying probability numbers that can help make the choice. If some targets
are missing, we need to decide if this is an instance of the desired relation at all, or if the targets found are false
positives. A machine learning algorithm can be trained to make this choice.
ONTOLOGY EXTRACTION FROM LARGE CORPORA
1. A different application of extraction technology is building a large knowledge base or ontology of facts from
a corpus. This is different in three ways:
• First it is open-ended—we want to acquire facts about all types of domains, not just one specific domain.
• Second, with a large corpus, this task is dominated by precision, not recall—just as with question
answering on the Web.
• Third, the results can be statistical aggregates gathered from multiple sources, rather than being
extracted from one specific text.
2. Here is one of the most productive templates: NP such as NP (, NP) * (,)? ((and | or) NP)?
3. Here the bold words and commas must appear literally in the text, but the parentheses are for grouping,
the asterisk means repetition of zero or more, and the question mark means optional.
4. NP is a variable standing for a noun phrase
5. This template matches the texts “diseases such as rabies affect your dog” and “supports network protocols
such as DNS,” concluding that rabies is a disease and DNS is a network protocol.
6. Similar templates can be constructed with the key words “including,” “especially,” and “or other.” Of course
these templates will fail to match many relevant passages, like “Rabies is a disease.” That is intentional.
7. The “NP is a NP” template does indeed sometimes denote a subcategory relation, but it often means
something else, as in “There is a God” or “She is a little tired.” With a large corpus we can afford to be picky; to
use only the high-precision templates.
8. We’ll miss many statements of a subcategory relationship, but most likely we’ll find a paraphrase of the
statement somewhere else in the corpus in a form we can use.

AUTOMATED TEMPLATE CONSTRUCTION


Clearly these are examples of the author–title relation, but the learning system had no knowledge of authors
or titles. The words in these examples were used in a search over a Web corpus, resulting in 199 matches. Each
match is defined as a tuple of seven strings, (Author, Title, Order, Prefix, Middle, Postfix, URL), where Order is
true if the author came first and false if the title came first, Middle is the characters between the author and
title, Prefix is the 10 characters before the match, Suffix is the 10 characters after the match, and URL is the
Web address where the match was made.
1. Each template has the same seven components as a match.
2. The Author and Title are regexes consisting of any characters (but beginning and ending in letters) and
constrained to have a length from half the minimum length of the examples to twice the maximum length.
3. The prefix, middle, and postfix are restricted to literal strings, not regexes.
4. The middle is the easiest to learn: each distinct middle string in the set of matches is a distinct candidate
template. For each such candidate, the template’s Prefix is then defined as the longest common suffix of all the
prefixes in the matches, and the Postfix is defined as the longest common prefix of all the postfixes in the
matches.
5. If either of these is of length zero, then the template is rejected.
6. The URL of the template is defined as the longest prefix of the URLs in the matches.
The biggest weakness in this approach is the sensitivity to noise. If one of the first few templates is incorrect,
errors can propagate quickly. One way to limit this problem is to not accept a new example unless it is verified
by multiple templates, and not accept a new template unless it discovers multiple examples that are also found
by other templates.

28
3.11 NATURAL LANGUAGE PROCESSING
Natural Language Processing (NLP) refers to AI method of communicating with an intelligent systems using a
natural language such as English.
Processing of Natural Language is required when you want an intelligent system like robot to perform as per
your instructions, when you want to hear decision from a dialogue based clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural languages’ humans use.
The input and output of an NLP system can be −
• Speech
• Written Text

Components of NLP

There are two components of NLP as given −


Natural Language Understanding (NLU)
Understanding involves the following tasks −
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
Natural Language Generation (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural language from some
internal representation.
It involves −
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful phrases, setting
tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.
The NLU is harder than NLG.

Difficulties in NLU

NL has an extremely rich form and structure.


It is very ambiguous. There can be different levels of ambiguity −
• Lexical ambiguity − It is at very primitive level such as word-level.
• For example, treating the word “board” as noun or verb?
• Syntax Level ambiguity − A sentence can be parsed in different ways.
• For example, “He lifted the beetle with red cap.” − Did he use cap to lift the beetle or he lifted
a beetle that had red cap?
• Referential ambiguity − Referring to something using pronouns. For example, Rima went to
Gauri. She said, “I am tired.” − Exactly who is tired?
• One input can mean different meanings.
• Many inputs can mean the same thing.

NLP Terminology

• Phonology − It is study of organizing sound systematically.


• Morphology − It is a study of construction of words from primitive meaningful units.
• Morpheme − It is primitive unit of meaning in a language.
• Syntax − It refers to arranging words to make a sentence. It also involves determining the
structural role of words in the sentence and in phrases.

29
• Semantics − It is concerned with the meaning of words and how to combine words into
meaningful phrases and sentences.
• Pragmatics − It deals with using and understanding sentences in different situations and how
the interpretation of the sentence is affected.
• Discourse − It deals with how the immediately preceding sentence can affect the interpretation
of the next sentence.
• World Knowledge − It includes the general knowledge about the world.

Steps in NLP

There are general five steps −


• Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language. Lexical analysis is dividing
the whole chunk of txt into paragraphs, sentences, and words.
• Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and
arranging words in a manner that shows the relationship among the words. The sentence such
as “The school goes to boy” is rejected by English syntactic analyzer.

• Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The
text is checked for meaningfulness. It is done by mapping syntactic structures and objects in
the task domain. The semantic analyzer disregards sentence such as “hot ice-cream”.
• Discourse Integration − The meaning of any sentence depends upon the meaning of the
sentence just before it. In addition, it also brings about the meaning of immediately succeeding
sentence.
• Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It
involves deriving those aspects of language which require real world knowledge.

Implementation Aspects of Syntactic Analysis

There are several algorithms researchers have developed for syntactic analysis, but we consider only the
following simple methods −
• Context-Free Grammar
• Top-Down Parser
Let us see them in detail −
Context-Free Grammar
It is the grammar that consists of rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence −

30
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand and
process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which describe
what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols. According
to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then the string combined
by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
The parse tree can be created as shown −

Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such as
"The bird peck the grains" can be wrongly permitted. i. e. the subject-verb agreement error is approved as
correct.
Merit − The simplest style of grammar, therefore widely used one.
Demerits −
• They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.

31
•To bring out high precision, multiple sets of grammar need to be prepared. It may require a
completely different sets of rules for parsing singular and plural variations, passive sentences,
etc., which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that
matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again
with a different set of rules. This is repeated until a specific rule is found which describes the structure of the
sentence.
Merit − It is simple to implement.
Demerits −
• It is inefficient, as the search process has to be repeated if an error occurs.
• Slow speed of working.

3.12 MACHINE TRANSLATION SYSTEMS


All translation systems must model the source and target languages, but systems vary in the type of models
they use. Some systems, attempt to analyse the sourced language text all the way into an interlingua knowledge
representation and then generate sentences in the target language from that representation. This is difficult
because it involves three unsolved problems:
• creating a complete knowledge representation of everything; parsing into that representation and
generating sentences from that representation. Other systems are based on a transfer model.
STATISTICAL MACHINE TRANSLATION
1. Find parallel texts: First, gather a parallel bilingual corpus. For example, a Hansard is a record of
parliamentary debate. Canada, Hong Kong, and other countries produce bilingual Hansards, the European
Union publishes its official documents in 11 languages, and the United Nations publishes multilingual
documents.
Bilingual text is also available online; some Web sites publish parallel content with parallel URLs, for example,
/en/ for the English page and /fr/ for the corresponding French page. The leading statistical translation
systems train on hundreds of millions of words of parallel text and billions of words of monolingual text.
2. Segment into sentences: The unit of translation is a sentence, so we will have to break the corpus into
sentences. Periods are strong indicators of the end of a sentence but consider “Dr. J.R. Smith of Rodeo Dr.
PAD $ 29.99 ON 9.9.09”; only the final period ends a sentence. One way to decide if a period ends a sentence
is to train a model that takes as features the surrounding words and their parts of speech. This approach
achieves about 98% accuracy.
3. Align sentences: For each sentence in the En1glish version, determine what sentence(s) it corresponds to in
the French version. Usually, the next sentence of English corresponds to the next sentence of French in a 1:1
match, but sometimes there is variation: one sentence in one language will be split into a 2:1 match, or the
order of two sentences will be swapped, resulting in a 2:2 match. By looking at the sentence lengths alone (i.e.,
short sentences should align with short sentences), it is possible to align (1:1, 1:2, or 2;2, etc) with accuracy in
the 90% to 99% range using a variation on the Viterbi algorithm.
4. Align phrases: Within a sentence, phrases can be aligned by a process that is similar to that used for sentence
alignment but requiring iterative improvement. When we start, we have no way of knowing that “qui dort”
aligns with “sleeping”, but we can arrive at that alignment by a process of aggregation of evidence.
5. Extract distortions: Once we have an alignment of phrases, we can define distortion probabilities. Simply
count how often distortion occurs in the corpus for each distance.
6. Improve estimates with EM: Use expectation – maximization to improve the estimate of P(f|e) and P(d)
values. We compute the best alignments with the current values of these parameters in the E step, then update
the estimates in the M step and iterate the process until convergence.

32
3.13 SPEECH RECOGNITION
Definition: Speech recognition is the task of identifying a sequence of SPEECH words uttered by a speaker, given
the acoustic signal. It has become one of the mainstream applications of AI.
1. Example: The phrase “recognize speech” sounds almost the same as “wreak a nice beach” when spoken
quickly. Even this short example shows several of the issues that make speech problematic.
2. First segmentation: written words in English have spaces between them, but in fast speech there are no
pauses in “wreck a nice” that would distinguish it as a multiword phrase as opposed to the single word
“recognize”.
3. Second, coarticulation: when speaking quickly the “s” sound at the end of “nice” merges with the “b” sound
at the beginning of “beach” yielding something that is close to a “sp”. Another problem that does not show up
in this example is homophones – words like “to”, “too” and “two” that sound the same but different in meaning
4. Once we define the acoustic and language models, we can solve for the most likely sequence of words using
the Viterbi algorithm.

Acoustic Model
1. An analog-to-digital converter measures the size of the current – which approximates the amplitude of the
sound wave at discrete intervals called as sampling rate.
2. The precision of each measurement is determined by the quantization factor; speech recognizers typically
keep 8 to 12 bits. That means that a low-end system, sampling at 8 kHz with 8-bit quantization, would require
nearly half a megabyte per minute of speech.
3. A phoneme is the smallest unit of sound that has a distinct meaning to speakers of a particular language.
For example, the “t” in “stick” sounds similar enough to the “t” in “tick” that speakers of English consider them
the same phoneme.
Each frame is summarized by a vector of features. Below picture represents phone model.

Figure. Translating the acoustic signal into a sequence of frames. In this diagram each frame is described by the
discretized values of three acoustic features; a real system would have dozens of features.

Building a speech recognizer


1. The quality of a speech recognition system depends on the quality of all its components – the language
model, the word-pronunciation models, the phone models, and the signal processing algorithms used to extract
spectral features from the acoustic signals.
2. The systems with the highest accuracy work by training a different model for each speaker, thereby capturing
differences in dialect as well as male / female and other variations. This training can require several hours of
interaction with the speaker, so the systems with the most widespread adoption do not create speaker-specific
models.
3. The accuracy of a system depends on several factors. First, the quality of the signal matters: a high-quality
directional microphone aimed at a stationary mouth in a padded room will do much better than a cheap
microphone transmitting a signal over phone lines from a car in traffic with the radio playing. The vocabulary
size matters: when recognizing digit strings with a vocabulary of 11 words (1-9 plus “oh” and “zero)”, the word
error rate will be below 0.5%, whereas it rises to about 10% on news stories with a 20,000-word vocabulary,
and 20% on a corpus with a 64,000-word vocabulary. The task matters too: when the system is trying to
accomplish a specific task – book a flight or give directions to a restaurant – the task can often be a
accomplished perfectly even with a word error rate of 109% or more.

33
3.14 ROBOT
1. Robots are physical agents that perform tasks by manipulating the physical world.
2. To do so, they are equipped with effectors such as legs, wheels, joints, and grippers.
3. Effectors have a single purpose: to assert physical forces on the environment.
4. Robots are also equipped with sensors, which allow them to perceive their environment.
5. Present day robotics employs a diverse set of sensors, including cameras and lasers to measure the
environment, and gyroscopes and accelerometers to measure the robot’s own motion.
6. Most of today’s robots fall into one of three primary categories. Manipulators, or robot arms are physically
anchored to their workplace, for example in a factory assembly line or on the International Space Station.
Robot Hardware
1. Sensors are the perceptual interface between robot and environment.
2. Passive sensors, such as cameras, are true observers of the environment: they capture signals that are
generated by other sources in the environment.
3. Active sensors, such as sonar, send energy into the environment. They rely on the fact that this energy is
reflected to the sensor. Active sensors tend to provide more information than passive sensors, but at the
expense of increased power consumption and with a danger of interference when multiple active sensors are
used at the same time. Whether active or passive, sensors can be divided into three types, depending on
whether they sense the environment, the robot’s location, or the robot’s internal configuration.
4. Range finders are sensors that measure the distance to nearby objects. In the early days of robotics, robots
were commonly equipped with sonar sensors. Sonar sensors emit directional sound waves, which are reflected
by objects, with some of the sound making it.
5. Stereo vision relies on multiple cameras to image the environment from slightly different viewpoints,
analyzing the resulting parallax in these images to compute the range of surrounding objects. For mobile ground
robots, sonar and stereo vision are now rarely used, because they are not reliably accurate.
6. Other range sensors use laser beams and special 1-pixel cameras that can be directed using complex
arrangements of mirrors or rotating elements. These sensors are called scanning lidars (short for light detection
and ranging).
7. Other common range sensors include radar, which is often the sensor of choice for UAVs. Radar sensors can
measure distances of multiple kilometers. On the other extreme end of range sensing are tactile sensors such
as whiskers, bump panels, and touch-sensitive skin. These sensors measure range based on physical contact
and can be deployed only for sensing objects very close to the robot.
8. A second important class of sensors is location sensors. Most location sensors use range sensing as a primary
component to determine location. Outdoors, the Global Positioning System (GPS) is the most common solution
to the localization problem.
9. The third important class is proprioceptive sensors, which inform the robot of its own motion. To measure
the exact configuration of a robotic joint, motors are often equipped with shaft decoders that count the
revolution of motors in small increments.
10. Inertial sensors, such as gyroscopes, rely on the resistance of mass to the change of velocity. They can help
reduce uncertainty.
11. Other important aspects of robot state are measured by force sensors and torque sensors. These are
indispensable when robots handle fragile objects or objects whose exact shape and location is unknown.
Robotic Perception
1. Perception is the process by which robots map sensor measurements into internal representations of the
environment. Perception is difficult because sensors are noisy, and the environment is partially observable,
unpredictable, and often dynamic. In other words, robots have all the problems of state estimation (or filtering)
2. As a rule of thumb, good internal representations for robots have three properties: they contain enough
information for the robot to make good decisions, they are structured so that they can be updated efficiently,
and they are natural in the sense that internal variables correspond to natural state variables in the physical
world.

Figure. Robot perception can be viewed as temporal inference from sequence of actions and measurements,
as illustrated by this dynamic Bayers network.

34
2. Another machine learning technique enables robots to continuously adapt to broad changes in sensor
measurements.
3. Adaptive perception techniques enable robots to adjust to such changes. Methods that make robots collect
their own training data (with labels!) are called self-supervised. In this instance, the robot machine learning to
leverage a short-range sensor that works well for terrain classification into a sensor that can see much farther.
PLANNING TO MOVE
1. All of a robot’s deliberations ultimately come down to deciding how to move effectors.
2. The point-to-point motion problem is to deliver the robot or its end effector to a designated target location.
3. A greater challenge is the compliant motion problem, in which a robot moves while being in physical contact
with an obstacle.
4. An example of compliant motion is a robot manipulator that screws in a light bulb, or a robot that pushes a
box across a tabletop. We begin by finding a suitable representation in which motion-planning problems can
be described and solved. It turns out that the configuration space—the space of robot states defined by
location, orientation, and joint angles—is a better place to work than the original 3D space.
5. The path planning problem is to find a path from one configuration to another in configuration space.
6. Here are two main approaches: cell decomposition and skeletonization. Each reduces the continuous path-
planning problem to a discrete graph-search problem. In this section, we assume that motion is deterministic,
and that localization of the robot is exact. Subsequent sections will relax these assumptions.
7. The second major family of path-planning algorithms is based on the idea of skeletonization. These
algorithms reduce the robot’s free space to a one-dimensional representation, for which the planning problem
is easier. This lower-dimensional representation is called a skeleton of the configuration space.
Configuration Space
1. It has two joints that move independently. Moving the joints alters the (x, y) coordinates of the elbow and
the gripper. (The arm cannot move in the z direction.) This suggests that the robot’s configuration can be
described by a four-dimensional coordinate: (xe, ye) for the location of the elbow relative to the environment
and (xg, yg) for the location of the gripper. Clearly, these four coordinates characterize the full state of the
robot.
They constitute what is known as workspace representation.
2. Configuration spaces have their own problems. The task of a robot is usually expressed in workspace
coordinates, not in configuration space coordinates. This raises the question of how to map between workspace
coordinates and configuration space.
3. These transformations are linear for prismatic joints and trigonometric for revolute joints. This chain of
coordinate transformation is known as kinematics.
4. The inverse problem of calculating the configuration of a robot whose effector location is specified in
workspace coordinates is known as inverse kinematics.
The configuration space can be decomposed into two subspaces: the space of all configurations that a robot
may attain, commonly called free space, and the space of unattainable configurations, called occupied space.

Cell Decomposition Methods


1. The simplest cell decomposition consists of a regularly spaced grid.
2. Grayscale shading indicates the value of each free-space grid cell—i.e., the cost of the shortest path from
that cell to the goal.
3. Cell decomposition methods can be improved in several ways, to alleviate some of these problems. The first
approach allows further subdivision of the mixed cells perhaps using cells of half the original size. This can be
continued recursively until a path is found that lies entirely within free cells. (Of course, the method only works
if there is a way to decide if a given cell is a mixed cell, which is easy only if the configuration space boundaries
4. Have relatively simple mathematical descriptions.) This method is complete provided there is a bound on
the smallest passageway through which a solution must pass. One HYBRID A* algorithm that implements this
is hybrid A*.
Modified Cost Functions
1. A potential field is a function defined over state space, whose value grows with the distance to the closet
obstacle.
2. The potential field can be used as an additional cost term in the shortest-path calculation.
3. This induces an interesting trade off. On the one hand, the robot seeks to minimize path length to the goal.
On the other hand, it tries to stay away from obstacles by virtue to minimizing the potential function.
35
4. There exist many other ways to modify the cost function. For example, it may be desirable to smooth the
control parameters over time.
Skeletonization methods
1. The second major family of path-planning algorithms is based on the idea of skeletonization.
2. These algorithms reduce the robot’s free space to a one-dimensional representation, for which the planning
problem is easier.
3. This lower-dimensional representation is called a skeleton of the configuration space.
4. It is a Voronoi graph of the free space - the set of all points that are equidistant to two or more obstacles. To
do path planning with a Voronoi graph, the robot first changes its present configuration to a point on the
Voronoi graph.
5. It is easy to show that this can always be achieved by a straight-line motion in configuration space. Second,
the robot follows the Voronoi graph until it reaches the point nearest to the target configuration. Finally, the
robot leaves the Voronoi graph and moves to the target. Again, this final step involves straight-line motion in
configuration space.

Figure (a) A repelling potential field pushes the robot away from obstacles, (b) Path found by simultaneously
minimizing path length and the potential.

Figure (a) The Voronoi graph is the set of points equidistance to two or more obstacles in configuration space
(b) A probabilistic moodmap, composed of 100 randomly chosen points in free space.

Robust methods
1. A robust method is one that assumes a bounded amount of uncertainty in each aspect of a problem but does
not assign probabilities to values within the allowed interval.
2. A robust solution is one that works no matter what actual values occur, provided they are within the assumed
intervals.
3. An extreme form of robust method is the conformant planning approach.

Figure. A two-dimensional environment, velocity uncertainty cone, and envelope of possible robot motions.
The intended velocity is v, but with uncertainty the actual velocity could be anywhere in Cv, resulting in a final
configuration somewhere in the motion envelope, which means we wouldn’t know if we hit the hole or not.

36
Figure. The first motion command and the resulting envelope of possible robot motions. No matter what the
error, we know the final configuration will be to the left of the hole.

37

You might also like