Knowledge Engineering
Knowledge Engineering
Knowledge Engineering
Knowledge Engineering
1.1 Introduction
The discipline of knowledge engineering grew out of the early work on expert systems in
the seventies. With the growing popularity of knowledge-based systems (as these were
by then called), there arose also a need for a systematic approach for building such sys-
tems, similar to methodologies in main-stream software engineering. Over the years, the
discipline of knowledge engineering has evolved into the development of theory, meth-
ods and tools for developing knowledge-intensive applications. In other words, it provides
guidance about when and how to apply particular knowledge-presentation techniques for
solving particular problems.
In this chapter we first discuss (Sec. 1.2) a number of principles, that have become the
baseline of modern knowledge engineering. These include the common distinction made
in knowledge engineering between task knowledge and domain knowledge. In Sec. 1.3
we explore the notion of problem-solving tasks in detail and present typical patterns and
methods user for solving such tasks. In Sec. 1.4 we focus on the domain perspective, in
particular the representation and use of ontologies. Finally, Sec. 1.5 summarizes the main
techniques that are being used in knowledge engineering. examples of their use.
1.2 Baseline
The early expert systems were based on an architecture which separated domain knowl-
edge, in the form a knowledge base of rules, from a general reasoning mechanism. This
distinction still is still valid in knowledge engineering practice. In the early eighties a num-
ber of key papers were published that set the scene for a systematic approach to knowledge
engineering.
In 1982 Newell published a paper on “The Knowledge Level”[28] in which he ar-
gued the need for a description of knowledge at a level higher the the level of symbols in
knowledge-representation systems. The knowledge-level was his proposal for realizing a
description of an AI system in terms of its rational behavior: why does the system (the
“agent”) perform this “action”, independent of its symbolic representation in rules, frames
or logic (the “symbol” level). Descriptions at the knowledge level has since become a
principle underlying knowledge engineering.
Two other key publications came from Clancey. His “Epistemology of a rule-based
system” [8] can be viewed as a first knowledge-level description of a knowledge-based
1
2 1. Knowledge Engineering
system, in which he distinguished various knowledge types. Two tears later his article
“Heuristic classification” appeared [9] which described a standard problem-solving pat-
tern in knowledge-level terms. Such patterns subsequently became an important focus of
knowledge-engineering research; these patterns typically serve as reusable pieces of task
knowledge. We treat these in more depth in Sec. 1.3.
In the nineties the attention of the knowledge-engineering shifted gradually to domain
knowledge, in particular reusable representations in the form of ontologies. A key pa-
per, which also quite wide attention outside the knowledge-engineering community was
Gruber’s paper on portable ontologies [16]. During this decade ontologies are getting
widespread attention as vehicles for sharing concepts within a distributed community such
as the web (e.g., see Chapter 21 on the Semantic Web). Similar to task knowledge, patterns
also play an important role on modeling domain knowledge. In Sec. 1.4 we describe in
some detail the main issues in ontology engineering.
Unlike some other methods, which undo previous design decisions, P&R fixes them.
P&R does not require an explicit description of components and their connections. Ba-
sically, the method operates on one large bag of parameters. Invocation of the propose
task produces one new parameter assignment, the smallest possible extension of an ex-
isting design. Domain-specific, search-control knowledge guides the order of parameter
selection, based on the components they belong to. The verification task in P&R applies
a simple form of constraint evaluation. The method performs domain-specific calculations
4 1. Knowledge Engineering
Figure 1.1: Top-level reasoning strategy of the P&R method in the form of a UML activity
diagram
Assessment Assessment is a task not often described in the AI literature, but of great
practical importance. Many assessment application have been developed over the years,
typically for tasks in financial domains, such as assessing a loan for mortgage application,
or in in the civil-service area, such as assessing whether a permit can be given. The task is
often confused with diagnosis, but where diagnosis is always considered with some faulty
state of the system, assessment is aimed at producing a decision: e.g. yes/no to accept a
mortgage application. During the Internet hype at the start of this decade every bank was
1. Knowledge Engineering 5
Figure 1.2: Top-level reasoning strategy of the basic assessment method in the form of a
UML activity diagram
solve the standard cases and leave atypical ones for manual assessment.
6 1. Knowledge Engineering
type. Designing systems with the help of patterns is in fact a major trend in software
engineering at large, see for example the work of Gamma and colleagues [13] on design
patterns2 .
The knowledge-engineering literature provides a number of proposals for specification
frameworks and/or languages of problem-solving methods. These include the “Generic
Task”approach [6], “Role-Limiting Methods” [24], “Components of Expertise” [40], Protégé
[32], KADS [48, 49] and CommonKADS [39]. Although there differences at a detailed
level between these approaches, the one important commonality: all rely on the notion of
“knowledge role”:
Typical knowledge role in the assessment method are “case data”, “norm” and “de-
cision”. These are method-specific names for the role that pieces of domain knowledge
play during reasoning. From a computational perspective, they limit the role that these
domain-knowledge elements can play, and therefore make problem solving more feasi-
ble, when compared to old “old”expert-systems idea of one large knowledge-vase with a
uniform reasoning strategy. In fact, the assumption behind PSM research is that the epis-
temological adequacy of the method gives one a handle on the computational tractability
of the system implementation based on it. This issue is of course a long-standing debate in
knowledge representation at large (see e.g., [4]).
Another issue that frequently comes up in discussions about problem-solving meth-
ods is their correspondence with human reasoning. Early work on KADS used problem-
solving methods as a coding scheme for expertise data [47]. Over the years the growing
consensus has become that, while human reasoning can form an important inspirational
source for problem-solving method and while it is use to use role cognitively-plausible
terms for knowledge role, the problem-solving strategy may well be different. Machines
have different qualities than humans. For example, a method that requires a large memory
space cannot be carried out by a human expert, but presents no problem to a computer pro-
gram. In particular methods for synthetic tasks, where the solution space is usually large,
problem-solving methods often have no counterpart in human problem solving.
2 Problem-solving methods would be called “strategy patterns” in the terminology of Gamma et al.
1. Knowledge Engineering 7
the language that were developed followed the maxim of structure preserving specifica-
tion [44]: if the structure of the formal specification closely follows the structure of the
informal knowledge model, any problems found during verification activities performed
on the formal model can be easily translated in terms of possible repairs on the original
knowledge model.
In particular the Common KADS framework was the subject of a number of formalisa-
tion attempts, see [12] for an extensive survey. Such languages would follow the structure
of Common KADS model into (1) a domain layer, where an ontology is specified describ-
ing the categories of the domain knowledge and the relationships between these categories
(i.e. the boxes in Fig. 1.5; (2) knowledge roles link the components of the method to el-
ements of the application domain; (3) inference steps that are the atomic elements of a
problem solving method (i.e. the ovals in Fig. 1.2), and (4) a task definition which em-
poses a control structure over the inference steps to complete the definition of the problem
solving method.
A simplified example is shown in Fig. 1.3, using a simplification of the syntax of (ML)2
[45]:
• the domain layer specificies a number of declarative facts in the domain. These facts
are already organised in three different modules.
• the inference steps then specify how these knowledge roles can be used in a problem
solving method: an abstraction step consists of a deductive (modus ponens) step over
an abstraction rule, whereas a hypothesise step consists of an abductive step over a
causation rule.
• finally, the task model specifies how these atomic inference steps must be strung
together procedurally to form a problem solving method: in this a sequence of a
deductive abstraction step followed by an abductive hypothesise step.
The impact of the languages such (ML)2 [45], KARL [11] and many others (see
[12]) was in one sense very limited: although the knowledge modelling methods are in
widespread use, the corresponding formal languages have not received widespread adop-
tation. Rather than direct adoption, their influence is perhaps mostly seen through the fact
that they forced a much more precise formulation of the principles behind the knowledge
modelling methods.
There is renewed activity in the area of formal languages for problem solving methods
at the time of writing. This is causes by an interest from web services. Web-services are
composed into work-flows, and these workflows often exhibit typical patterns (e.g. browse-
order-pay-ship, or search-retrieve-process-report). Problem solving methods are essen-
tially reusable workflows of reasoning-patterns, and the established lessons from problem
solving methods may well be applicable to this new area.
8 1. Knowledge Engineering
DOMAIN
patient-data: temp(patient1) = 38
symptom-definitions: temp(P ) > 37 → f ever(P )
sympotomatology: hepatitis(P ) → f ever(P )
KNOWLEDGE ROLES
from patient-data: A 7→ data(A)
from symptom-definition: A→B 7 → abstraction(A, B)
from sympotomatology: A→B 7 → causation(A, B)
INFERENCE
abstract(A1 , A2 ): data(A1 ) ∧ abstraction(A1 , A2 ) → observation(A2 )
hypothesise(B1 , B2 ): observation(B2 ) ∧ causation(B1 , B2 )hypothesis(B1 )
TASK
begin abstract(A,B) ; hypothesise(B,C) end
Figure 1.3: A simple problem-solving method specification in the style of (ML)2
1.4 Ontologies
During the nineties ontologies become popular in computer science. Gruber [16] defines
an ontology as an “explicit specification of a conceptualization. Several authors have made
small adaptations to this. A common definition nowadays is:
Definition 3. An ontology is an explicit specification of a shared conceptualization that
holds in a particular context.
The addition of the adjective “shared”is important, as the primary goal of ontologies
in computer science was to enable knowledge sharing. Up till the end of the nineties
“ontology”was a niche term, used by a few researchers in the knowledge engineering and
representation field3 . The term is now in widespread use, mainly due to enormous need for
shared concepts in the distributed world of the web. People ánd programs need to share at
least some minimal common vocabulary. Ontologies have become in particular popular in
the context of the Semantic Web effort, see Chapter 21.
In practice, we are confronted with many different conceptualizations, i.e. ways of
viewing the world. Even is in a single domain there can be multiple viewpoints. Take for
example the concept of a heat exchanger as shown in Fig. 1.4. The conceptualization of
a heat exchanger is can be very different, depending on whether we take the viewpoint of
the physical structure, the internals of the process, or the operational management.
“Context” is therefore an important notion when reusing an ontology. We cannot expect
other people or programs to understand our conceptualization, if we do not explicate what
the context of the ontology is. Lenat [21] has made an attempt to define a theory of context
spaces. In practice, we see most often that context is being defined though typing the
ontology. We discuss ontology types in Sec. 1.4.2. and/or reusing an ontology.
The plural form used in the title of this section is revealing. The notion of ontology has
been a subject of debate in philosophy for many ages. The study of ontology, or the theory
3 At a preparation meeting for a DARPA program in this area in 1995, the rumors were that DARPA manage-
of “that what is” (from the Greek “ontos” = being), has been a discipline in its own right
since the days of Aristotle, who can be seen as founder and inspirator. The plural form
signifies the pragmatic use made of the notion in modern computer science. We talk now
about “ontologies” as the state of the art does not provide us with a single theory of what
exists.
Other languages, in particular conceptual graphs (see Chapter 5) have been popular for
specifying ontologies. Recently, OWL has gained wide popularity. OWL is the W3C Web
Ontology Language [46]. Its syntax is XML based. Things defined in OWL get a URI,
which simplifies reuse. OWL sails between Scylla of expressiveness and the Charybdis
of computability by defining a subset of OWL (OWL DL) that is equivalent to a well-
understood fragment of description logic (see Chapter 3). User who limit themselves to
this fragment of OWL get some guarantees w.r.t computability. The OWL user is free
to step outside the bounds of OWL DL, if s/he requires additional expressive power. An
overview of OWL is given in Chapter 21.
One might ask, whether the use of description logic as a basis for an ontology language
does to contradict the statement of the start of this section, namely that ontologies are
not specified with a reasoning mechanism in mind. It is undoubtedly true that the DL
reasoning paradigm biases the way one models the world with OWL. However, subclass
modeling appears to be an intrinsic feature of modeling domain knowledge. The use of
a DL-style modeling in knowledge of domains has been popular since the early days of
KL-ONE [5]. Also, DL reasoning is often mainly used to validate the ontology; typically,
additional reasoning knowledge is needed in applications. The fact that Web community is
defining a separate rule language to complement OWL is also evidence for this. Still, one
could take the view that a more general first-order language would be better for ontology
specification, as it introduces less bias and provides the possibility of specifying reasoning
within the same language. If one takes this position, a language like KIF [15] is a prime
candidate as ontology language.
part of a club, my hand is part of me, but this doesn’t imply my hand is part of the club”).
Several revised versions of this taxonomy have been published [30, 2].
Lexical resources such as WordNet7 [26], can also be seen as foundational ontologies,
although with a weaker semantic structure. WordNet defines a semantic network with
17 different relation types between concepts used in natural language. Researchers in
this area are proposing richer semantic structuring for WordNet (e.g. [31]). The original
Princeton WordNet targets the English-American language; WordNets now exist or are
being developed for almost all major languages.
7 http://wordnet.princeton.edu/
8 http://www.nlm.nih.gov/pubs/factsheets/umls.html
9 http://www.cs.man.ac.uk/ rector/ontologies/simple-top-bio/
10 http://www.geneontology.org/
11 http://www.w3.org/2004/02/skos/
12 http://www.getty.edu/research/conducting research/vocabularies/
12 1. Knowledge Engineering
Figure 1.5: Configuration-design ontology in the VT experiment [18] (in the form of a
UML class diagram)
ically operate on an ontology of states and state transitions. Tate’s plan ontology [42] is
another example of a task-specific ontology.
be used with an ontology language that supports only binary relations, such as OWL) and
the work of Rector on patterns for defining value sets [34]. Gangemi has published a set of
design patterns for a wide range of modeling situations [14].
• Sorting techniques are used for capturing the way people compare and order con-
cepts, and can lead to the revelation of knowledge about classes, properties and
priorities.
• Diagram-based techniques include the generation and use of concept maps, state
transition networks, event diagrams and process maps. The use of these is partic-
ularly important in capturing the “what, how, when, who and why” of tasks and
events.
Specialised tool support has been developed for each of these techniques. Table 1.2
briefly describes some of these techniques, and correlates them with the appropriate tool
support.
This wide variety of techniques is required to access the many different types of knowl-
edge possessed by experts. This is referred to as the Differential Access Hypothesis, and
has been shown experimentally to have supporting evidence.
Fig. 1.6 below presents the various techniques described above and shows the types of
knowledge they are mainly aimed at eliciting. The vertical axis on the figure represents the
dimension from object knowledge to process knowledge, and the horizontal axis represents
the dimension from explicit knowledge to tacit knowledge. The details of these techniques
are described in a number of survey articles and textbooks, such as [3], [37, Ch. 8], and
[25].
1. Knowledge Engineering 15
Bibliography