Human-Aided Multi-Entity Bayesian Networks Learning From Relational Data
Human-Aided Multi-Entity Bayesian Networks Learning From Relational Data
Human-Aided Multi-Entity Bayesian Networks Learning From Relational Data
Relational Data
Abstract
An Artificial Intelligence (AI) system is an autonomous system which emulates human’s mental
and physical activities such as Observe, Orient, Decide, and Act, called the OODA process. An AI
system performing the OODA process requires a semantically rich representation to handle a
complex real world situation and ability to reason under uncertainty about the situation. Multi-
Entity Bayesian Networks (MEBNs) combines First-Order Logic with Bayesian Networks for
representing and reasoning about uncertainty in complex, knowledge-rich domains. MEBN goes
beyond standard Bayesian networks to enable reasoning about an unknown number of entities
interacting with each other in various types of relationships, a key requirement for the OODA
process of an AI system. MEBN models have heretofore been constructed manually by a domain
expert. However, manual MEBN modeling is labor-intensive and insufficiently agile. To address
these problems, an efficient method is needed for MEBN modeling. One of the methods is to use
machine learning to learn a MEBN model in whole or in part from data. In the era of Big Data,
data-rich environments, characterized by uncertainty and complexity, have become ubiquitous.
The larger the data sample is, the more accurate the results of the machine learning approach can
be. Therefore, machine learning has potential to improve the quality of MEBN models as well as
the effectiveness for MEBN modeling. In this research, we study a MEBN learning framework to
develop a MEBN model from a combination of domain expert's knowledge and data. To evaluate
the MEBN learning framework, we conduct an experiment to compare the MEBN learning
framework and the existing manual MEBN modeling in terms of development efficiency.
1 Introduction
An Artificial Intelligence (AI) system is an autonomous system which emulates human’s mental
and physical activities such as the OODA process [Boyd, 1976][Boyd, 1987]. The OODA process
contains four steps (Observe, Orient, Decide, and Act). In the Observe step, data or signal from
every mental/physical situation (e.g., states, activities, and goals) of external systems (e.g., an
adversary) as well as internal systems (e.g., a command center or an allied army) in the world are
observed in some internal observing guidance or control, and observations derived from data or
signal are produced. In the Orient step, observations become information, formed as a model, by
reasoning, analysis, and synthesis influenced from knowledge, belief, condition, etc. The Orient
step can produce plan and COA (Course of Actions). Hypotheses or alternatives for models can
be decided by an AI in the Decide step. In the Act step, all decided results are implemented, and
real activities and states can be operated and produced, respectively. The four steps continue until
the end of the life cycle of the AI system.
An AI system performing the OODA process requires a semantically rich representation to
handle situations in a complex real and/or cyber world. Furthermore, the number of entities and
the relationships among them may be uncertain. For this reason, the AI system needs an
expressive formal language for representing and reasoning about uncertain, complex, and
dynamic situations. Multi-Entity Bayesian Networks (MEBNs) [Laskey, 2008] combines First-
Order Logic with Bayesian Networks (BNs) [Pearl, 1988] for representing and reasoning about
uncertainty in complex, knowledge-rich domains. MEBN goes beyond standard Bayesian
networks to enable reasoning about an unknown number of entities interacting with each other in
various types of relationships, a key requirement for the AI system.
MEBN has been applied to AI systems [Laskey et al., 2000][Wright et al., 2002][Costa et al.,
2005][Suzic, 2005][Costa et al., 2012][Park et al., 2014][Golestan, 2016][Li et al., 2016][Park et
al., 2017]. In a recent review of knowledge representation formalisms for AI, Golestan et al.
[2016] recommended MEBN as having the most comprehensive coverage of features needed to
represent complex situations. Patnaikuni et al., [2017] reviewed various applications using
MEBN.
In previous applications of MEBN to the AI system, the MEBN model or MTheory was
constructed manually by a domain expert using a MEBN modeling process such as Uncertainty
Modeling Process for Semantic Technology (UMP-ST) [Carvalho et al., 2016]. Manual MEBN
modeling is a labor-intensive and insufficiently agile process. Greater automation through
machine learning may save labor and enhance agility. For this reason, Park et al. [2016]
introduced a process model called Human-aided Multi-Entity Bayesian Networks Learning for
Predictive Situation Awareness by combining domain expertise with data. The process model was
focused on the predictive situation awareness (PSAW) domain. However, the process model is
not necessary to be only applied to the PSAW domain. This paper defines a general process
model for Human-aided Multi-Entity Bayesian Networks Learning 1 called HML. HML specifies
four steps with guidelines about how to associate with (1) domain knowledge, (2) database model,
and (3) MEBN learning. Thus, the general process model is capable of generalization to reuse a
variety of domains to develop a domain specific MEBN model (e.g., predictive situation
awareness, planning, natural language processing, and system modeling). (1) Domain knowledge
can be specified by a reference model which is an abstract framework to which a developer refers
in order to develop a specific model. Such a reference model can support the design of a MEBN
model in the certain domain and improve the quality of the MEBN model. (2) A database model
can support to the design of a MEBN model for automation, if there are common elements
between the database model and MEBN model. For example, Relational Model (RM), which is a
database model based on first-order predicate logic [Codd, 1969; Codd, 1970] and the most
widely used data model in the world, represent entities and attributes. Such entities and attributes
in RM can be mapped to entities and random variables in MEBN, respectively. Thus, common
elements between a database model and MEBN can be used to automated conversion. In this
research, we introduce the use of RM as the database model for MEBN learning. (3) MEBN
learning is to learn an optimal MEBN model which fits well an observed datasets in database
models. MEBN learning can be classified into two types: One is MEBN structure learning (e.g.,
finding optimal structures of MEBN) and another is MEBN parameter learning (e.g., finding an
optimal set of parameters for local distributions of random variables in MEBN). In this research,
MEBN parameter learning is introduced. Overall, HML contains three supportive methodologies:
(1) a domain reference model (e.g., a reference model for predictive situation awareness, planning,
natural language processing, or system modeling), (2) a mapping between MEBN and a database
model (e.g., RM), and (3) MEBN learning (e.g., a conditional linear Gaussian parameter learning
1
This paper is an extension of the conference paper, [Park et al., 2016].
2
for MEBN) to develop a MEBN model efficiently and effectively. In this research, we conduct an
experiment to compare Human-aided Multi-Entity Bayesian Networks Learning (HML) and the
existing manual MEBN modeling in terms of development efficiency.
Section 2 provides background information about MEBN and an existing MEBN modeling
process. In Section 3, a relational database, which is an illustrative database model used to
explain HML, is introduced. In Section 4, HML is presented with an illustrative example. In
Section 5, an experiment comparing between HML and the existing MEBN modeling process is
introduced. Finally, conclusions are presented and future research directions are discussed.
2 Background
This section provides background information about Multi-Entity Bayesian Networks (MEBNs),
a script form of MEBN, and Uncertainty Modeling Process for Semantic Technology (UMP-ST).
In Section 2.1, MEBN as a representation formalism is presented with some definitions and an
example for MEBN. In Section 2.2, a simple script form of MEBN is introduced. HML in this
research is a modification of UMP-ST, so UMP-ST is introduced in Section 2.3.
To understand how this works, consider Fig. 1, which shows an MTheory called the Threat
3
Assessment MTheory. This MTheory contains six MFrags: Vehicle, MTI_Condition, Context,
Situation, Speed, and Speed_Report. An MFrag (e.g., Fig. 2) may contain three types of random
variables: context RVs, denoted by green pentagons, resident RVs, denoted by yellow ovals, and
input RVs, denoted by gray trapezoids. Each MFrag defines local probability distributions for its
input RVs. These distributions may depend on the input RVs, whose distributions are defined in
other MFrags. Context RVs express conditions that must be satisfied for the distributions defined
in the MFrag to apply.
Specifically, Fig. 2 shows the Situation MFrag (from the Threat Assessment MTheory) used for
an illustrative example of an MFrag. The Situation MFrag represents probabilistic knowledge of
how the threat level of a region at a time is measured depending on the vehicle type of detected
objects. For example, if in a region there are many tracked vehicles (e.g., Tanks), the threat level
of the region will be high. An MFrag consists of a set of resident nodes, a set of context nodes, a
set of input nodes, an acyclic directed graph for the nodes, and a set of class local distributions
(CLD) for the nodes. The context nodes (i.e., isA(v, VEHICLE), isA(rgn, REGION), isA(t, TIME),
and rgn = Location(v, t)) for this MFrag (shown as pentagons in the figure) show that this MFrag
applies when a vehicle entity is substituted for the ordinary variable v, a region entity is
substituted for the ordinary variable rgn, a time entity is substituted for the ordinary variable t,
and a vehicle v is located in region rgn at time t. The context node rgn = Location(v, t) constrains
the values of v, rgn, and t from the possible instances of vehicle, region, and time, respectively.
For example, suppose v1 and v2 are vehicles and r1 is a region in which the only v1 is located at
time t1. The context node rgn = Location(v, t) will allow only an instance of (v1, r1, t1) to be
selected, but not (v2, r1, t1), because r1 is not the location of v2 at t1. Next, we see the input node
VehicleType(v), depicted as a trapezoid. Input nodes are nodes whose distribution is defined in
another MFrag. For example, a resident node VehicleType(v) is found in the MFrag Vehicle from
the top left in Fig. 1.
In Fig. 2, the node ThreatLevel(rgn, t) is a resident node, which means its distribution is defined
in the MFrag of the figure. Like the graph of a common BN, the fragment graph shows
probabilistic dependencies. CLD 2.1 in the script below shows that a class local distribution for
ThreatLevel(rgn, t) describes its probability distribution as a function of the input nodes given the
instances that satisfy the context nodes. The class local distribution (CLD) 𝐿𝐿𝐶𝐶 can be used to
produce an instance local distribution (ILD) 𝐿𝐿𝐼𝐼 in a SSBN. Note that in Subsection 4.3.2, these
CLD and ILD are defined formally. In this Subsection, we introduce CLD with a simple
4
illustrative example. The class local distribution of ThreatLevel(rgn, t), which depends on the
type of vehicle, can be expressed as CLD 2.1. The CLD is defined in a language called Local
Probability Description Language (LPDL). In our example, the probabilities of the states, High
and Low, of ThreatLevel(rgn, t) are defined as a function of the values, High and Low, of
instances rgn = Location(v, t) of the parent nodes that satisfy the context constraints. For the high
state in the first if-scope in CLD 2.1, the probability value is assigned by the function described
by “1 – 1 / CARDINALITY(v)”. The CARDINALITY function returns the number of instances
of v satisfying the if-condition. For example, in CLD 2.1, if the situation involves three vehicles
and two of them are tracked, then the CARDINALITY function will return 2. We see that as the
number of tracked vehicles becomes very large, the function, “1 – 1 / CARDINALITY(v)”, will
tend to 1. This means the threat level of the region will be very high.
CLD 2.1: The class local distribution for the resident node ThreatLevel(rgn, t)
1 if some v have (VehicleType = Tracked) [
2 High = 1 – 1 / CARDINALITY(v),
3 Low = 1 – High
4 ] else [
5 High = 0,
6 Low = 1
7 ]
CLD 2.2: The class local distribution for the continuous resident node ThreatLevel(rgn, t)
1 if some v have (VehicleType = Tracked) [
2 10 * CARDINALITY(v) + NormalDist(10, 5)
3 ] else [
4 NormalDist(10, 5)
5 ]
The meaning of CLD 2.2 is that the degree of the threat in the region is 10 * the number of
tracked vehicles plus a normally distributed error with mean 10 and variance 5. Currently, LPDL
limits continuous nodes to conditional linear Gaussian (CLG) distributions [Sun et al., 2010],
defined as:
p�𝑅𝑅 | Pa(𝑅𝑅), 𝐶𝐶𝐶𝐶𝑗𝑗 � = 𝒩𝒩(𝑚𝑚 + 𝑏𝑏1 𝑃𝑃1 + 𝑏𝑏2 𝑃𝑃2 … , +𝑏𝑏𝑛𝑛 𝑃𝑃𝑛𝑛 , 𝜎𝜎 2 ), (2.1)
where Pa() is a set of continuous parent resident nodes of the continuous resident node, R, having
{𝑃𝑃1 , … , 𝑃𝑃𝑛𝑛 }, 𝑃𝑃𝑖𝑖 is a i-th continuous parent node, 𝐶𝐶𝐶𝐶𝑗𝑗 is a j-th configuration of the discrete parents
of R (e.g., CF = {𝐶𝐶𝐶𝐶1 = (VehicleType = Tracked), 𝐶𝐶𝐶𝐶2 = (VehicleType = Wheeled)}), m is a
regression intercept, 𝜎𝜎 2 is a conditional variance, and 𝑏𝑏𝑖𝑖 is regression coefficient.
Using the above MTheory example, we define elements of MTheory more precisely. The
following definitions are taken from [Laskey, 2008].
5
Definition 2.1 (MFrag) An MFrag F, or MEBN fragment, consists of: (i) a set 𝑪𝑪 of context
nodes, which represent conditions under which the distribution defined in the MFrag is valid; (ii)
a set 𝑰𝑰 of input nodes, which have their distributions defined elsewhere and condition the
distributions defined in the MFrag; (iii) a set 𝑹𝑹 of resident nodes, whose distributions are defined
in the MFrag 2; (iv) an acyclic directed graph G, whose nodes are associated with resident and
input nodes; and (iv) a set 𝑳𝑳𝐶𝐶 of class local distributions, in which an element of 𝑳𝑳𝐶𝐶 is associated
with each resident node.
The nodes in an MFrag are different from the nodes in a common BN. A node in a common BN
represents a single random variable, whereas a node in an MFrag represents a collection of RVs:
those formed by replacing the ordinary variables with identifiers of entity instances that meet the
context conditions. To emphasize the distinction, we call the resident nodes in the MBEN nodes,
or MNodes.
MNodes correspond to predicates (for true/false RVs) or terms (for other RVs) of first-order logic.
An MNode is written as a predicate or term followed by a parenthesized list of ordinary variables
as arguments.
Definition 2.2 (MNode) An MNode, or MEBN Node, is a random variable N(ff) specified an n-
ary function or predicate of first-order logic (FOL), a list of n arguments consisting of ordinary
variables, a set of mutually exclusive and collectively exhaustive possible values, and an
associated class local distribution. The special values true and false are the possible values for
predicates, but may not be possible values for functions. The RVs associated with the MNode are
constructed by substituting domain entities for the n arguments of the function or predicate. The
class local distribution specifies how to define local distributions for these RVs.
For example, the node ThreatLevel(rgn, t) in Fig. 2 is an MNode specified by a FOL function
ThreatLevel(rgn, t) having two possible values (i.e., High and Low).
Definition 2.3 (MTheory) An MTheory M, or MEBN Theory, is a collection of MFrags that
satisfies conditions given in [Laskey, 2008] ensuring the existence of a unique joint distribution
over its random variables.
An MTheory is a collection of MFrags that defines a consistent joint distribution over random
variables describing a domain. The MFrags forming an MTheory should be mutually consistent.
To ensure consistency, conditions must be satisfied such as no-cycle, bounded causal depth,
unique home MFrags, and recursive specification condition [Laskey, 2008]. No-cycle means that
the generated SSBN will contain no directed cycles. Bounded causal depth means that depth from
a root node to a leaf node of an instance SSBN should be finite. Unique home MFrags means that
each random variable has its distribution defined in a single MFrag, called its home MFrag.
Recursive specification means that MEBN provides a means for defining the distribution for an
RV depending on an ordered ordinary variable from previous instances of the RV.
The IsA random variable is a special RV representing the type of an entity. IsA is commonly used
as a context node to specify the type of entity that can be substituted for an ordinary variable in an
MNode.
Definition 2.4 (IsA random variable) An IsA random variable, IsA(ov, tp), is an RV
corresponding to a 2-argument FOL predicate. The IsA RV has value true when its second
argument tp is filled by the type of its first argument ov and false otherwise.
For example, in the Situation MFrag in Fig. 2, isA(v, VEHICLE) is an IsA RV. Its first argument v
is filled by an entity instance and its second argument is the type symbol VEHICLE. It has value
2
Bold italic letters are used to denote sets.
6
true when its first argument is filled by an object of type VEHICLE.
The script contains several predefined single letters (F, C, R, IP, RP, and L). The single letters, F,
C, and R denote an MFrag, a context node, and a resident node, respectively. For a resident node
(e.g., Y) in an MFrag, a resident parent (RP) node (e.g., X), which is defined in the MFrag, is
denoted as RP (e.g., [R: Y [RP: X]]). For an input node, we use a single letter IP. Each node can
contain a CLD denoted as L. For example, suppose that there is a CLD type called
ThreatLevelCLD. If the resident node ThreatLevel in Line 4 uses the CLD type ThreatLevelCLD,
the resident node ThreatLevel can be represented as [R: ThreatLevel (rgn, t) [L:
ThreatLevelCLD]].
Fig. 3 shows a schema for the threat assessment RDB. In the example RDB schema, there are 14
relations: Region, Situation, Location, Time, Speed, Speed_Report, ActualObject, ObserverOf,
Vehicle, VehicleType, Predecessor, ReportedVehicle, MTI, and MTI_Condition. The relation
Region is for region information in this situation which can contain a region index (e.g., region1
and region2). The relation Time is for time information which is a time stamp representing a time
interval (e.g., t1 and t2). The relation Vehicle is for vehicle information which is an index of a
ground-vehicle (e.g., v1 and v2). The relation VehicleType indicates a type of the vehicle (e.g.,
Wheeled and Tracked). The relation MTI is for a moving target indicator (e.g., mti1 and mti2). An
MTI can be in a condition (e.g., Good and Bad) depending on weather and/or maintenance
conditions. The relation MTI_Condition indicates the condition of an MTI. The relation Location
is for a location where a vehicle is located. The relation Situation indicates a threat level to a
region at a time (e.g., Low and High). The relation ReportedVehicle indicates a reported vehicle
8
from an MTI. The relation Speed indicates an actual speed of a vehicle, while the relation
Speed_Report indicates a reported speed of the vehicle from an MTI. The relation ActualObject
maps a reported vehicle to an actual vehicle. The relation ObserverOf indicates that an MTI
observes a vehicle. The relation Predecessor indicates a temporal order between two-time stamps.
Table 1 shows parts of the relations of the threat assessment RDB for the schema in Fig. 3. As
shown Table 1, we choose six relations (Vehicle, Time, Region, VehicleType, Location, and
Situation), which are used for an illustrative example through the next section. For example, the
relation Vehicle contains a primary key VID. The relation VehicleType contains a primary key
v/Vehicle.VID, which is a foreign key from the primary key VID in the relation Vehicle and an
attribute VehicleType.
Initial inputs of the process can be needs and/or missions from stakeholders in a certain domain
(e.g., predictive situation awareness, planning, natural language processing, and system
modeling). In the Analyze Requirements step, specific requirements for a reasoning model (in our
case, an MTheory) were identified. According to different domain type, the goals and the
reasoning model will be different. For example, the goal of the reasoning model for the PSAW
domain can be to identify a threatening target. The reasoning model in such domain may contain
sensor models representing sensing noise. The goal of the reasoning model for the natural
language processing domain can be to analyze natural languages or classify documents. The
9
reasoning model in such domain may contain random variables specifying text corpus. In the
Define World Model step, a target world where the reasoning model operates is defined. In the
Construct Reasoning Model step, a training dataset can be an input for MEBN learning to learn a
reasoning model. In the Test Reasoning step, a test dataset can be an input for the evaluation of
the learned reasoning model. An output of the process is the evaluated reasoning model. The
following subsections describe these four steps with the illustrative example (Section 3) of threat
assessment in the PSAW domain.
Requirement 4.1
Goal 1: Identify characteristics of a target.
10
4.1.2 Identify Queries/Evidence
The queries are specific questions for which the reasoning model is used to estimate and/or
predict answers. The evidence consists of inputs used for reasoning. From these sub-steps, a set of
goals, a set of queries for each goal, and a set of evidence for each query are defined. The
following shows an illustrative example of defining a requirement.
Requirement 4.1
Goal 1: Identify characteristics of a target.
Query 1.1: What is the speed of the target at a given time?
Evidence 1.1.1: A speed report from a sensor.
...
11
Fig. 6 Define World Model
This step decomposes into two sub-steps (Fig. 6): (1) a Define Structure Model step and (2) a
Define Rules step. The Define Structure Model step defines the structure model from the
requirements, domain knowledge and/or existing data schemas. The structure model is used to
identify rules. The Define Rules step defines a rule or an influencing relationship between
attributes (e.g., A and B) in relations for the structure model. The influencing relationship is a
relationship between attributes in which there is an unknown causality between the attributes (e.g.,
influencing(A, B)). If we know the causality, the influencing relationship becomes a causal
relationship (e.g., causal(A, B)). For many parent attributes which influence a child attribute (or
variable), a brace is used to indicate a set of parent attributes (e.g., causal({A, B}, C)). The child
attribute is called a Target Attribute (or Variable). Also, the set of rules should satisfy the No-
cycle condition which means that the generated SSBN will contain no directed cycles (Section
2.1).
13
type ActualObject, the relation observerof is used as the context type ObserverOf, and the relation
predecessor is used as the context type Predecessor.
In the Define Rules step, a (conditional) local distribution for an attribute (e.g., the speed attribute)
can be defined by expert knowledge. In reality, we can meet a situation in which there is no
dataset for a rule and all we have is expert knowledge. For example, a conditional local
distribution for the speed attribute given the RV VehicleType can be identified by a domain expert
(e.g., if a vehicle type is wheeled, then the speed of the vehicle on a road is normally distributed
with a mean of 50MPH and a standard deviation of 20MPH). The rules derived in this step are
used in the next step to construct an MTheory and the MTheory will be learned by MEBN
parameter learning.
In this step, we determine whether data can be obtained for the attribute, and if so, either
collect data or identify an existing dataset. We usually divide the data into a training dataset and a
test dataset. If no data can be obtained, we use the judgment of domain experts to specify the
necessary probability distributions. For example, a belief for the target type attribute can be
P(Wheeled) = 0.8 and P(Tacked) = 0.2. If neither data nor expert judgment is available, we
consider whether the attribute is really necessary. For this, we can return the Analyze
Requirements step to modify the requirements.
This step decomposes into two sub-steps (Fig. 7): (1) a Map to Reasoning Model step and (2) a
Learn Reasoning Model step. The Map to Reasoning converts the structure model and rules in the
world model to an initial reasoning model. The Learn Reasoning Model uses a machine learning
method to learn the model from a training dataset.
14
Definition 4.1 (Entity-Relationship Normalization) Entity-Relationship Normal Form if either
its primary key is a single attribute which is not a foreign key, or its primary key contains two or
more attributes, all of which are foreign keys.
For example, in the relations in Table 1, we can notice that the relation VehicleType has as its
primary key a single foreign key imported from the relation Vehicle. They (Vehicle and
VehicleType) can be merged into a relation Vehicle. The following table shows the normalized
table. Note that after the Entity-Relationship Normalization, any foreign key in a relation comes
from a certain entity relation (not relationship relation), which has only one attribute for its
primary key, so there is no need to indicate which primary key is used for the entity relation and
we can simplify the notation for a foreign key (e.g., rgn/Region.RID and t/Time.TID). For
example, the notation of the foreign key for the vehicle (i.e., v/Vehicle.VID) in the relation
Location (Table 1) can be simplified as v/Vehicle.
The initial MTheory, which is directly derived from an RM using MEBN-RM, can be learned
using a dataset for each relation associated with the MFrag in the initial MTheory. More
specifically, the parameter for the distribution of each RV in the MFrag is learned from a
corresponding dataset of the relation for the MFrag. For example, the MFrag Situation is derived
from the relation Situation in Table 2. The parameter for the distribution of the variable
ThreatLevel (Line 23 in MTheory 4.1) can be learned from the dataset of the attribute
ThreatLevel (Table 2). An RV (e.g., ThreatLevel) in MEBN can contain a default distribution
which is used for reasoning, for cases in which none of the conditions associated with parent RVs
is valid. In MEBN, the parameter for the default distribution should be learned from a dataset
containing such cases.
3
Pa(X) is the set of parent nodes of the node X.
16
of the variable Speed depends on the values of variables PreviousSpeed and VehicleType. To
learn the parameter for the variable Speed in this situation, it is not enough to use only the dataset
from the relation Speed, because the dataset doesn't contain information associated with the
variable VehicleType. Therefore, relations related to each target variable and its parent variables
should be joined. For this purpose, we need to define a joining rule. The variable PreviousSpeed
indicates a variable Speed which happens just before a current time, so the relation Predecessor,
which indicates a previous time and a current time, is also used for this joining for Rule 1. In
other words, the relations Speed, VehicleType, and Predecessor are joined.
Location.
Vehicle. Location.v Location.t Situation.
Case Location
VehicleType ThreatLevel
Vehicle.VID Situation.t Situation.rgn
1 Tracked Vehicle13 Time18 Region6 High
2 Tracked Vehicle15 Time21 Region7 High
3 Tracked Vehicle17 Time24 Region8 Low
4 Tracked Vehicle19 Time27 Region9 High
5 Wheeled Vehicle21 Time30 Region10 High
6 Wheeled Vehicle23 Time33 Region11 Low
7 Wheeled Vehicle0 Time2 Region0 High
8 Tracked Vehicle1 Time2 Region0 High
9 Tracked Vehicle2 Time5 Region1 Low
10 Wheeled Vehicle3 Time5 Region1 Low
11 Tracked Vehicle4 Time8 Region2 High
12 Tracked Vehicle5 Time8 Region2 High
13 Tracked Vehicle6 Time11 Region3 Low
14 Tracked Vehicle7 Time11 Region3 Low
15 Tracked Vehicle8 Time14 Region4 High
16 Tracked Vehicle9 Time14 Region4 High
17 Wheeled Vehicle10 Time17 Region5 High
18 Wheeled Vehicle11 Time17 Region5 High
There are several joining rules (e.g., Cartesian Product, Outer Join, Inner Join, and Natural Join)
[Date, 2011]. Table 3 shows an illustrative example of a joined dataset derived from Table 2
using Inner Join. Inner Join produces all tuples from relations as long as there is a match between
values in the columns being joined. Table 3 shows the result of performing an inner join of the
relations Situation and Vehicle through the relation Location and then selecting the columns to be
used for learning. The rows (or tuples) in the relations Situation and Vehicle are joined when rows
of the attributes v/Vehicle, t/Time, and Location/Region in the relation Location match rows of
the attribute VID in the relation Vehicle and rows of the attributes rgn/Region and t/Time in the
relation Situation. The first column denotes cases for the matched rows. The second column
(Vehicle.VehicleType) denotes the rows from the attribute VehicleType of the relation Vehicle in
Table 2. The third column (Location.v and Vehicle.VID) denotes the matched rows between the
17
attribute v from the relation Location and the attribute VID from the relation Vehicle. The fourth
column (Location.t and Situation.t) denotes the matched rows between the attribute t from the
relation Location and the attribute t from the relation Situation. The fifth column
(Location.Location and Situation.rgn) denotes the matched rows between the attribute Location
from the relation Location and the attribute rgn from the relation Situation. The sixth column
(Situation.ThreatLevel) denotes the rows from the attribute ThreatLevel from the relation
Situation.
Table 3 shows the joined dataset for the attributes VehicleType and ThreatLevel. Now, let us
assume that the attribute ThreatLevel will be a target variable depending on the variable
VehicleType (i.e., Rule 2: causal(VehicleType, ThreatLevel)). For each instance of the target
variable ThreatLevel, Table 3 provides relevant information about all the configurations of its
parents (i.e., the parent variable VehicleType). For example, there is the value High for the Threat
level in the situation at Region5 in Time17 (i.e., Cases 17 and 18). The value High is associated
with the wheeled Vehicle10 and the wheeled Vehicle11. In other words, two parent instances (i.e.,
the wheeled Vehicle10 and the wheeled Vehicle11) influence the target instance (i.e., the value
High). The following shows a query script 4 which is an example using Inner Join for Table 3.
SQL script 4.1 joins the relations Situation and VehicleType through the relation Location. In
other words, the rows (or tuples) in the relations Situation and VehicleType are joined as shown
Table 3 in which the two attributes (VehicleType, ThreatLevel) are connected through the
attributes of the relation Location. The joined table shows how the dataset of the attribute
VehicleType and the dataset of the attribute ThreatLevel are linked.
We introduced how to join relations according to given rules. In the following, we discuss how to
update an MFrag from the given rules. The initial threat assessment MTheory (MTheory 4.1) was
constructed by MEBN-RM. Each MFrag in the initial MTheory contains resident nodes without
any causal relationship between the resident nodes. The given rules enable the resident nodes to
specify such causal relationships. Therefore, the MFrag in the initial MTheory may be changed
according to the updated resident nodes with the causal relationships by the given rules. This
process contains three steps: Construct input/parent nodes, Construct context nodes, and Refine
context nodes.
18
MTheory 4.1, for the target variable ThreatLevel in the MFrag Situation, its parent VehicleType is
defined in the MFrag Vehicle. The parent variable VehicleType should be an input node in the
MFrag Situation. The following MFrag shows the updated result for the MFrag Situation using
Rule 2.
The primary key for VehicleType is VID associated with the entity VEHICLE, so IsA (v,
VEHICLE) is added in the updated MFrag Situation (MFrag 4.1).
The primary key for the attribute Location are v and t, so the IsA context nodes IsA (v, VEHICLE)
and IsA (t1, TIME) are added to MFrag 4.2.
The Learn Reasoning Model step applies MTheory learning from relational data. In this research,
we focus on MEBN parameter learning given a training dataset D in RM and an initial MTheory
M. Before introducing MEBN parameter learning, some definitions are introduced in the
following subsections.
4.3.2 Definitions for Class Local Distribution and Instance Local Distribution
We introduced Definition 2.2 (MFrag), Definition 2.3 (MNode), and Definition 2.4 (MTheory)
for MEBN in Section 2. An MTheory is composed of a set of MFrags F on the MTheory (i.e., M
= {F1, F2, ... , Fn}) conditions (e.g., no-cycle, bounded causal depth, unique home MFrags, and
recursive specification condition [Laskey, 2008]) in Section 2. An MFrag F is composed of a set
of MNodes N and a graph G for N (i.e., F = {N, G}). An MNode is composed of a function or
predicate of FOL ff and a class local distribution (L) (i.e., N = {ff, L}).
A CLD specifies how to define local distributions for instantiations of the MNode. The following
CLD 4.1 and ILD 4.1 show illustrative examples for a CLD (Class Local Distribution) and an
ILD (Instance Local Distribution), respectively (recall that these examples were discussed in
Section 2). CLD 4.1 defines a distribution for the threat level in a region. If there are no tracked
vehicles, the default probability distribution described in Line 6 is used. The default probability
distribution in a CLD is used for ILDs generated from the CLD, when no nodes meet the
conditions defined in the MFrag for parent nodes.
This CLD is composed of a class parent condition CPCi and a class-sub-local distribution CSDi.
A CPC indicates a condition whether a CSD associated with the CPC is valid. The CSD (class-
sub-local distribution) is a sub-probability distribution which specifies how to define a local
distribution under a condition in an RV derived from an MNode. For example, the first line in
CLD 4.1 is CPC1 which indicates a condition of the first class-sub-local distribution CSD1. In this
case, the condition means that “if there is an object whose type is Tracked”. If this is satisfied (i.e.,
CPC1 is valid), then CSD1 is used. A CPC can be used for a default probability distribution. In
such a case, it is called a default CPC specified by CPCd and also the CSD associated with CPCd
is called a default CSD, CSDd.
For this case, we assume that the MNode contains two states (High and Low) and the discrete
parent the RV VehicleType(v) has two states (Tracked and Wheeled). The pair of CSD1 and CSD1
20
(in Line 1 and 2) is for VehicleType(v) = Tracked. The pair of CSD2 and CSD2 (in Line 3 and 4) is
for VehicleType(v) = Wheeled. The pair of CPCd and CSDd (in Line 5 and 6) is for a default
distribution.
The following ILD 4.1 shows the ILD derived from the above CLD given one region entity
region1 and one vehicle entity v1. Like the CLD, the ILD is composed of an instance parent
condition IPCi and an instance-sub-local distribution ISDi. The IPC indicates a condition whether
the ISD associated with the IPC is valid. The ISD is a probability distribution which is defined in
an ILD of a random variable.
Now, consider a situation in which there is a region containing no vehicles. In this case, the
default probability distribution in CLD 4.1 is used for such an ILD (i.e., ILD 4.2), because all
conditions associated with parent nodes (i.e., CPC1 and CPC2 in CLD 4.1) are not valid.
ILD 4.2: Default ILD with one region without any vehicle
1 P(ThreatLevel_region1) =
2 IPC1: {
3 ISD1: High = Ɵd.1; Low = Ɵd.2;
4 }
We name CLD 4.2 an Inverse Cardinality Average. Thus, the type of the class local distribution
is the inverse cardinality average (i.e., TYPE(CLD 4.2) = Inverse Cardinality Average CLD).
CLD 4.2 consists of two CSDs (CSD1 and CSDd). CSD1 contains a parameter 𝜃𝜃, where 0 < 𝜃𝜃 < 1,
as shown CLD 4.2. CLD 4.2 represents probabilistic knowledge of how the threat level of a
region is measured depending on the vehicle type of detected objects. For example, if in a region
there are many tracked vehicles (e.g., Tanks), the threat level of the region at a certain time will
be high. The influence counting (IC) function CARDINALITY(obj) returns the number of
tracked vehicles from parents nodes. If there are many tracked vehicles, the probability of the
state High increases. If there is no tracked vehicles, the default probability distribution (i.e., CSDd)
described in Line 4 is used for the CLD of the MNode ThreatLevel(rgn, t). Thus, it indicates a
situation in peace time.
Here is another CLD example. CLD 4.3 shows the case of the continuous CLD with hybrid
parents. For this case, we assume that there is an MNode Range(v, t) which is a parent node of the
MNode ThreatLevel(rgn, t) and means a range between the region rgn and the vehicle v at a time
t.
The meaning of CLD 4.3 is that the threat level in the region is the number of tracked vehicles
divided by an average of the ranges of vehicles and then plus a normally distributed error with a
mean of Ɵ and a variance of 5. If there is no tracked vehicles, the default probability distribution,
NormalDist(10, 5), described in Lines 4 is used. If there are continuous parents, various
numerical aggregating (AG) functions (e.g., average, sum, and multiply) can be used. For
22
example, if there are three continuous parents Range1, Range2, and Range3, the numerical
aggregating functions average, sum, and multiply will construct three IPDs IPD1 = (Range1 +
Range2 + Range3)/3, IPD2 = (Range1 + Range2 + Range3), and IPD3 = (Range1 * Range2 *
Range3), respectively.
The above CLDs 4.2 and 4.3 are based on an influence counting (IC) function for discrete parents
and an aggregating (AG) function for continuous parents. Using such a function is related to the
aggregating influence problem, which treats many instances from a parent RV.
The CLD 4.1 uses a very simple aggregation rule that treats all counts greater than zero as
equivalent. In other words, a shared parameter in a CSD is learned from all instances of the parent
RV with counts greater than zero. For example, with CLD 4.1, suppose that there are two cases:
In Case 1, there is one tracked vehicle. And in Case 2, there are two tracked vehicles. For Case 1,
one VehicleType RV is constructed and CSD1 (Line 1) in CLD 4.1 is used for the parameter of the
distribution for the ThreatLevel. For Case 2, two VehicleType RVs are constructed and also CSD1
(Line 1) in CLD 4.1 is used for the parameter of the distribution for the ThreatLevel, although
there are two tracked vehicles. Thus, the shared parameter (i.e., High = Ɵ1.1 and Low = Ɵ1.2) for
CSD1 in CLD 4.1 is used regardless of the number of the parent instances (i.e., one vehicle in
Case 2, two vehicles in Case 2, and so on). In the following sections, we use such a simple
aggregation rule for MEBN parameter learning.
Table 3 is a joined dataset for the common CPC (i.e., CPC1 and CPC2). It can be sorted according
to each CPC as shown in Table 4. For example, the CPC1 in CLD 4.1 defines that it is only valid
23
if a case contains a tracked vehicle. Therefore, by CPC1, we can sort the joined dataset in Table 3.
Thus, the cases 1, 2, 3, 4, 8, 9, 11, 12, 13, 14, 15, and 16 are selected for CSD1, while other cases
are used for CSD2 (Table 4). We call this dataset a CSD dataset.
Definition 4.4 (CSD Dataset) Let there be a dataset D = {C1, C2, … , Cn}, where Ci is each case
(or row), and a CLD L = {(CPC1, CSD1), (CPC2, CSD2),…, (CPCm, CSDm)}. A CSD Dataset (CD)
is a dataset which is grouped by matching each class parent condition CPCj of L and each case Ci
in D. The set of grouped cases GCj = {C1, C2, … , Cl} is assigned to a corresponding class parent
condition CPCj.
For an RV, if there are cases for which the conditions associated with the parent RVs are not
satisfied, the dataset for a default CPC is required. The dataset for the default CPC (i.e., CPCd)
can be obtained by excluding the joined dataset from the original dataset. This is necessary
because we need a dataset which doesn’t include cases for which the conditions associated with
the parent RVs are satisfied. For example, in Table 2, there is an original dataset for the
ThreatLevel RV (i.e., the dataset in the relation Situation). Table 3 shows a joined dataset
associated with CPC1 and CPC2. The dataset for the default CPCd can be derived by subtracting
the joined dataset (Table 3) from the original dataset (Table 2). For example, the following is a
SQL script to extract the default dataset for the ThreatLevel RV from the joined dataset.
SQL script 4.2: SQL script for the default dataset of the ThreatLevel RV
1 SELECT
2 Situation.rgn, Situation. t, Situation .ThreatLevel
3 FROM Situation
4 WHERE NOT EXISTS (
5 SELECT *
6 FROM Location, Vehicle
7 WHERE
8 Situation.rgn = Location.Location &&
9 Situation.t = Location.t &&
10 Vehicle.VID = Location.v
11 )
The dataset for the ThreatLevel RV comes from the relation Situation (Line 3). When the dataset
is selected, there is a condition (Line 4) in which the dataset should not include a joined dataset
derived by Line 5~11. Using this script, the default dataset for the ThreatLevel RV is obtained
and means the threat level at a certain region, where there is no vehicle.
In the following subsections, a training dataset D means the CSD dataset for a certain CLD.
24
distribution is commonly used because it is conjugate to the multinomial distribution. With a
Dirichlet prior distribution, the posterior predictive distribution has a simple form [Heckerman et
al., 1995][Koller & Friedman, 2009].
As an illustrative example of the Dirichlet distribution parameter learning for a CLD, we use
CLD 4.1. Parameter learning for this CLD is to estimate CSD1's parameters (Ɵ1.1 and Ɵ1.2), and
CSD2's parameters (Ɵ2.1 and Ɵ2.2), and CSDd's parameters (Ɵd.1 and Ɵd.2). To estimate these
parameters, we can use the following predictive distribution using a Dirichlet conjugate prior,
discussed in Appendix A. Equation 4.1 shows the posterior predictive distribution for the value xk
of the RV X given a parent value 𝒂𝒂, the dataset D, and a hyperparameter α for the Dirichlet
conjugate prior.
where a value xk ∈ Val(𝑋𝑋), 𝒂𝒂 ∈ Val(Pa(X) = A), C[xq , 𝒂𝒂] is the number of times outcome xq in X
and its parent outcome 𝒂𝒂 in 𝑨𝑨 appears in D, a hyperparameter α ={αx 1 |𝒂𝒂 , … , αx N |𝒂𝒂 }, and N =
| Val(𝑋𝑋)|.
For the case of the CPC1 and CSD1, we can use the set of grouped cases GC1 in Table 4 as a
training dataset. And CSD1 has two parameters Ɵ1.1 (for High) and Ɵ1.2 (for Low). For the
parameters Ɵ1.1, we can use Equation 4.1 such as Ɵ1.1 = P(ThreatLevel = High | VehicleType =
Tracked, D = GC1, α), where α = {α𝐻𝐻𝐻𝐻𝐻𝐻ℎ|𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 , αLow |𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 }. If there were previously one
case for High|Tracked and two cases Low|Tracked, α𝐻𝐻𝐻𝐻𝐻𝐻ℎ|𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 1 and αLow |𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 = 2 are
used. This approach uses again for the case of the CPC2 and CSD2. To learn the parameter for the
CSDd, the default dataset discussed in Section 4.3.3 is required. The parameter Ɵd.1 and Ɵd.2 can
be learned from the default dataset using Equation 4.1 as the case of the CPC1 and CSD1.
Parameter learning for this CLD is to estimate CSD1's parameters (Ɵ1.0, Ɵ1.1 and Ɵ1.2), CSD2's
parameters (Ɵ2.0, Ɵ2.1 and Ɵ2.2), and CSDd's parameters (Ɵd.0, Ɵd.1 and Ɵd.2). We can write this
situation more formally. If X is a continuous node with n continuous parents U1, …, Un and m
discrete parents A1, …, Am, then the conditional distribution p(𝑋𝑋 | 𝒖𝒖, 𝒂𝒂) given parent states U = u
25
and A = a has the following form:
where L(a)(u) = 𝑚𝑚(𝒂𝒂) + 𝑏𝑏1(𝒂𝒂)𝑢𝑢1 + ⋯ + 𝑏𝑏𝑛𝑛(𝒂𝒂)𝑢𝑢𝑛𝑛 is a linear function of the continuous parents, with
intercept 𝑚𝑚(𝒂𝒂) , coefficients 𝑏𝑏𝑖𝑖(𝒂𝒂) , and standard deviation 𝜎𝜎 (𝒂𝒂) that depends on the state a of the
discrete parents. Given CPCj (i.e., given the state aj), estimating the parameters the intercept 𝑚𝑚(𝒂𝒂j ),
(𝒂𝒂 )
coefficients 𝑏𝑏𝑖𝑖 j , and standard deviation 𝜎𝜎 (𝒂𝒂j ) corresponds to estimating the CSD's parameters Ɵj.0,
Ɵj.1 and Ɵj.2, respectively.
The following shows multiple linear regression which is modified from [Rencher, 2003]. L(a)(u)
can be rewritten, if we suppose that there are k observations (Note that for one CSD case, we can
omit the state a, because we know it).
where i indexes the observations. For convenience, we can write the above equation more
compactly using matrix notation:
where l denotes a vector of instances for the observations, U denotes a matrix containing all
continuous parents in the observations, b denotes a vector containing an intercept 𝑚𝑚 and a set of
coefficients 𝑏𝑏𝑖𝑖 , and 𝝈𝝈 denotes a vector of regression residuals. The following equations show
these variables in forms of vectors and a matrix.
From the above settings, we can derive an optimal vector for the intercept and the set of
�.
coefficients 𝒃𝒃
Also, we can derive the optimal standard deviation 𝜎𝜎� from the above linear algebra term [Rencher,
2003].
26
�)𝑇𝑇 (𝒍𝒍 − 𝑼𝑼𝒃𝒃
(𝒍𝒍 − 𝑼𝑼𝒃𝒃 �) (4.7)
𝜎𝜎� = �
k−n−1
Using the above equations, the optimal parameters can be estimated. For CPC1 in CLD 4.4, CSD1
can be the following.
p(𝑋𝑋 | Speed, MTI_Condtion = 𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 ) = 𝒩𝒩�Ɵ1.0 + Speed ∗ Ɵ1.1 (𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 ) , Ɵ1.2 (𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺 ) �.
In this section, we discussed how to learn parameters for the conditional linear Gaussian CLD
using linear regression. For a conditional nonlinear Gaussian CLD, we can use nonlinear
regression. In this section, we didn't consider incremental parameter learning for the conditional
linear Gaussian CLD. For this, we can Bayesian regression [Press, 2003], which is more robust to
overfitting than the traditional multiple-regression.
The above relationship relations (i.e., Communicate and Meet) show true cases for predicates. For
example the relation Communicate contains the true cases {{v1, v2}, {v2, v3}, {v3, v4}}.
However, relationship relations do not explicitly represent false cases for the predicates. By
converting the above relations to the following relations, we can see the false cases explicitly.
This conversion is justified by CWA. Thus, if a case in a relationship relation is not true, it is
assumed to be false.
27
Vehicle Communicate Meet
VID VID1/Vehicle VID2/Vehicle Communicate VID1/Vehicle VID2/Vehicle Meet
v1 v1 v2 True v1 v2 True
v2 v1 v3 False v1 v3 False
v3 v1 v4 False v1 v4 True
v4 v2 v3 True v2 v3 True
v2 v4 False v2 v4 False
v3 v4 True v3 v4 False
The relation Vehicle in Table 6 contains four vehicle entities (v1 ~ v4). These entities can be used
to develop possible combinations of two vehicles interacting with each other as shown data in the
first and second column in the relations Communicate and Meet (i.e., {{v1, v2}, {v1, v3}, {v2,
v3}, {v1, v4}, {v2, v4}, {v3, v4}}). The relation Communicate means the possible combinations
between the two vehicles communicating with each other and contains an attribute Communicate
indicating whether the two vehicles are communicated (True) or not (False). From data in the
relation Communicate in Table 5, the true cases for the attribute Communicate in the relation
Communicate in Table 6 can be derived. The true cases for the attribute Meet in the relation Meet
in Table 6 are also derived using the same approach. Now, as we can see Table 6, the relations
Communicate and Meet explicitly contain the true and false cases for the attributes Communicate
and Meet, respectively.
To construct the set of combination between the four vehicles in the relation Vehicle, we can use
the following script.
The above script generates a new relation called All_Vehicles. The dataset for the relation
All_Vehicles contains {{v1, v2}, {v1, v3}, {v2, v3}, {v1, v4}, {v2, v4}, {v3, v4}}. The above
script selects the set of combination between the four vehicles occurring only once. To generate
the dataset for the relation Communicate in Table 6, we can use the following script.
The above script compares data between the relations All_Vehicles and Communicate. If there is a
28
same primary key between them, a value True is assigned to an attribute Communicate. If not, a
value False is assigned to the attribute Communicate. To generate the dataset in the relation Meet
in Table 6, we can use the same approach.
For the relations in Table 6, we assume the following CLD 4.5 in which the meeting between two
vehicles may influence the event for communication between the vehicles (i.e., P(Communicate |
Meet)). In CLD 4.5, CPC1 (Line 1) indicates a condition where two vehicles meet. CPC2 (Line 3)
indicates a condition where two vehicles don’t meet. For example, CSD2 (Line 4) represents the
probability that two vehicles VID1 and VID2 communicate with each other in the situation where
the two vehicles are not nearby.
To learn parameters in CLD 4.5, CSD datasets for CPC1 and CPC2 are required. To generate such
datasets, the processes in Section 4.3.1 Map to Reasoning Model can be used. For example, a
joined dataset between the relations Communicate and Meet is generated by matching same
vehicle entities in both relations. The joined dataset contains four attributes VID1, VID2,
Communicate, and Meet (e.g., {{v1, v2, True, True}, …, {v3, v4, True, False}). Then, parameter
learning as described in Section 4.3.4 Parameter Learning is used to construct the parameters in
CLD 4.5 (i.e., P(Communicate | Meet)).
30
(4.1.2) Measure Performance for Reasoning Model
(4.2) Evaluate Experimental Results
In (1) the Analyze Requirements step, there are three sub-steps: (1.1) the Identify Goals step, (1.2)
the Identify Queries/Evidence step, and (1.3) the Define Performance Criteria step. The goals
representing missions of the reasoning model is defined in (1.1). The queries, specific questions
for which the reasoning model is used to estimate and/or predict answers, and the evidence,
inputs used for reasoning, are defined in (1.2). Each query should include performance criteria
(1.3) for evaluation of reasoning.
In (2) the Define World Model step, there are two sub-steps: (2.1) the Define Structure Model step
and (2.2) the Define Rules step. The Define Rules step (2.2) contains two sub-steps: (2.2.1) the
Define Causal Relationships between RVs step and (2.2.2) the Define Distributions of RVs step.
In (2.2.1), candidate causal relationships (e.g., influencing(A, B) and causal(A, B)) between RVs
are specified using expert knowledge. In (2.2.2), a (conditional) local distribution of an RV is
defined by expert knowledge.
In (3) the Construct Reasoning Model step, there are two sub-steps: (3.1) the Map to Reasoning
Model step and (3.2) the Learn Reasoning Model step. The Map to Reasoning Model step (3.1) is
composed of three sub-steps: (3.1.1) the Perform Entity-Relationship Normalization step, (3.1.2)
the Perform MEBN-RM Mapping step, and (3.1.3) the Update Reasoning Model using the Rules
step. Before applying MEBN-RM to a relational model, the relational model is normalized using
Entity-Relationship Normalization (3.1.1). In (3.1.2), MEBN-RM is performed to construct an
initial MTheory from the relational model. In (3.1.3), the initial MTheory is updated according to
the rules defined in (2.2). The Update Reasoning Model using the Rules step (3.1.3) contains four
sub-steps: (3.1.3.1) the Join Relations step, (3.1.3.2) the Construct Input/Parent Nodes step,
(3.1.3.3) the Construct Context Nodes step, and (3.1.3.4) the Refine Context Nodes step. In
(3.1.3.1), some relations are joined and an updated MFrag is created, if RVs in a rule are defined
in different relations. The causal relationships for the RVs in the rule are defined in the updated
MFrag through (3.1.3.2). In (3.1.3.2), if there is an input node, ordinary variables associated with
the input node are defined in the updated MFrag. In (3.1.3.3), the context nodes associated with
the RVs in the rule are defined in the updated MFrag. For this, the conditions (specified by a
“Where” conditioning statement in SQL) in a joining script, used for joining relations in (3.1.3.1),
can be reused to construct such context nodes. In (3.1.3.4), ordinary variables sharing the same
entity (e.g., IsA (t, TIME) and IsA (t1, TIME)) are converted into a single ordinary variable (e.g.,
IsA (t, TIME)). Then, equal-context nodes (e.g., t = t1) for such ordinary variables are removed.
In (3.2) the Learn Reasoning Model step, a parameter learning algorithm performs to each RV in
the updated MFrag using a training dataset to generate the parameter of the distribution for the
RV.
In (4) the Test Reasoning Model step, there are two sub-steps: (4.1) the Conduct Experiments for
Reasoning Model step and (4.2) the Evaluate Experimental Results step. In (4.1) there are two
sub-steps: (4.1.1) the Test Reasoning Model from Test Dataset step and (4.1.2) the Measure
Performance for Reasoning Model step. In (4.1.1), the learned MTheory from (3) the Construct
Reasoning Model step is tested using a test dataset and (4.1.2) measured for performance between
results from the learned MTheory and the ground truth data in the test dataset. In (4.2) the
Evaluate Experimental Results step, whether the learned MTheory is accepted or not is decided
using the performance criteria defined in (1.3).
In this research, some steps in HML are automated (e.g., (3.1.2) the Perform MEBN-RM Mapping
step), while some other steps are not yet automated (e.g., (3.1.1) the Perform Entity-Relationship
Normalization step) but could be automated. Also, some other steps (e.g., (1.1) the Identify Goals
31
step) require aid from human (i.e., human centric). The following table shows the level of
automation (i.e., Automated, Automatable, and Human centric) for each step in HML.
For example, the (3.2) Learn Reasoning Model step is automated by the MEBN-RM mapping
algorithm (Section 3.6). The (3.1.1) the Perform Entity-Relationship Normalization step is
automatable by developing an algorithm converting from ordinary relations to the relations
satisfying Entity-Relationship Normalization. The (1.1) Identify Goals step is human centric and
require human support to perform it. Automatable steps can become automated steps by
developing specific processes, algorithms, and software programs. We leave these as future
studies.
We developed HML Tool that performs MEBN-RM and the MEBN parameter learning. HML
Tool is a JAVA based open-source program that can be used to create an MTheory script from a
relational data. This enables rapid development of an MTheory script by just clicking a button in
the tool. This is available on Github 5 (see Appendix B).
5
Github is a distributed version control system (https://github.com).
32
knowledge given to the participants is introduced in Section 5.1.
For the experiment, we performed three processes (preparation, execution, and evaluation). In the
preparation process, we prepared the experimental settings to make both groups to have same
conditions in terms of knowledge and skill for MEBN modelling for the simple heating
machinery. In the execution process, the main experiment for MEBN modelling was conducted.
In the process, participants in both groups had developed MEBN models using the two methods
assigned to each of them. In the evaluation process, development times by the participants were
analysed and MEBN models developed by them were tested in terms of accuracy using a
simulated test dataset.
Group A (UMP-
Steps Group B (HML) Time
ST)
Provided a lecture for BN, MEBN, the
1. Obtain relevant knowledge 4 hours
script form of MEBN, and UMP-ST
Provided a short test for UMP-ST &
2. Take a short test 30 min
MEBN
Graded the test results and selected
3. Divide into two groups 1 hour
participants for two groups
Provided a lecture for Time was checked
4. Obtain HML knowledge None
HML (Time A)
Before the execution process, (4) HML lecture was provided to Group B. The time for the lecture
was checked as Time A. The lecture contained the process of HML, the reference PSAW-MEBN
model, and how to use the HML tool.
(5) In the first step of the execution process, both groups were given a stakeholder requirement,
“Develop a MEBN model which is used to predict a total cost given input slabs”. Also, domain
knowledge was given to the participants.
The domain knowledge was about the following information: The simple heater system is
associated with two infrared thermal imaging sensors to sense the temperature of a slab, each
sensor has a sensing error with a normal distribution with a mean zero and a variance three, N(0,
3) (e.g., if it sensed 10 ℃, this means that the actual temperature is in a range between 7.15 ℃
and 12.85 ℃ with the 95th percentile), the heater system contains an actuator which is used to
control an energy value to heat a slab (i.e., the actuator calculates the energy value given the input
slab temperature), there is no energy loss when the energy value is used in the heater, all
manufacturing factors (e.g., the temperature, energy value, and cost) are normally distributed
continuous values, the energy unit is kWh (kilowatt-hour), there is a fixed slab weight 100kg,
there is an ordered fixed temperature 1200 ℃ for an output slab coming from the heater, and the
energy cost is 20cent/kWh.
Also, an idea of how to model the sensor error using BN was given. For example, to include the
sensor error, two random variables are used. The first random variable is for an actual
temperature, while the second random variable is for a sensed temperature. The actual
temperature, then, influences the sensed temperature with the error normal distribution (i.e., N(0,
3)). This can be modelled in a BN as P(sensed temperature | actual temperature) = actual
temperature + N(0, 3).
34
For the situation of the simple heating machinery, datasets were generated by a simulator
containing a ground truth model designed by a domain expert. The ground truth model contained
two parts. The first part is for an actual model which represents a physical world which can’t be
observed exactly. The second part is for a sensed model which represents an observed world
where we can see using sensors. Therefore, the datasets were divided into two parts: Actual data
and sensed data (Fig. 10). The sensed data (data sets in the rounded boxes in Fig. 10) were
provided to both groups in two formats: The data in an excel format and the data in a relational
database (RDB) (Fig. 11). The actual data (e.g., actual temperatures) were not given to either
group.
Fig. 10 Each of training and test data has sensed data and actual data for the simple heating machinery
Also, the simulator generated two datasets (as shown in Fig. 10): One was a training dataset
which was used by the participants to understand the context of the situation and learn a MEBN
model using HML, and another was a test dataset which was used to evaluate the models
developed by the participants in terms of prediction accuracy for the total cost (6.4.3 Evaluation
Process). For this model evaluation, the actual and sensed data in the test dataset were used. For
example, sensed data for the temperature of an input slab were used as evidence for the developed
model and the developed model was used to reason about a predicted total cost. The predicted
total cost was compared with a total cost derived from the actual data in the test dataset.
Participants were requested to (6) develop MEBN model requirements, (7) Define World Model,
and (8) construct Reasoning Model. And the development times Time B, Time C, and Time D
respectively were checked.
35
Fig. 11 Sensed datasets for the simple heating machinery
Total Development
Average
Group Participants Times
CRPS
(Hours: Minutes)
#1 1735.3 1:06
#2 74.6 2:48
Group A (UMP-ST) #3 114.78 2:21
Grand Average (Standard 641.53 2:05
Deviation) (947.45) (0:52)
#4 45.05 1:02
#5 45.05 0:58
Group B (HML) #6 40.48 1:28
Grand Average (Standard 43.53 1:09
Deviation) (2.64) (0:16)
In the experiment, we expected the participants would develop an ideal MEBN model. The
following figure shows ideal conditional relationships between random variables for the simple
heating machinery. In the ideal model, there are three parts: A situation group, an actual target
group, and a report group. The situation group contains a random variable representing an overall
total cost for this system. The actual target group contains three random variables (a temperature
for an input slab, an actual energy for heating, and a temperature for an output slab). The report
group contains two random variables (an observed/sensed temperature for the input slab and an
observed/sensed temperature for the output slab).
37
Fig. 12 Ideal conditional relationships between random variables for the simple heating machinery
In the experiment, we observed where the participants spent a lot of time. Table 12 shows time-
consuming tasks in the experiment. The mark “X” in the table means that it is a time-consuming
task for the method.
Group A Group B
Time-consuming tasks in the experiment
(UMP-ST) (HML)
-
1. Following process (UMP-ST or HML) X
(Supported by HML tool)
X
2. Finding structure model/rules X (Supported by the PSAW-
MEBN reference model)
3. Finding entity/RV/MFrag from relational -
X
data (Supported by MEBN-RM)
-
4. Finding parameter X (Supported by MEBN parameter
learning)
(1) Following UMP-ST process: Although the participants had studied UMP-ST, it was not easy
to follow the process. They didn’t have many experiences to develop a MEBN model using
UMP-ST, so they were not familiar with the process. They remembered the process by reading a
UMP-ST paper and developed their model according to each step of UMP-ST. For Group B, the
HML tool supported the development of a MEBN model. By clicking some buttons in the HML
tool, each step in HML was shown and the participants could make the models quickly. (2)
Finding Structure Model/Rules: The participants in both Groups were required to find the
structure model for the simple heating machinery. Although knowledge of the simple heating
machinery situation was given, the participants in both groups struggled to find the structure
model and rules. Group B was taught about the PSAW-MEBN reference model [Park et al., 2014].
The PSAW-MEBN reference model provides knowledge about a set of random variable groups
(Situation, Actual Target, and Report) and causal relationships (i.e., rules) for PSAW. However,
such knowledge did not have much influence on the development time for the structure model
and rules, because the context for the simple heating machinery was too simple to use the PSAW-
MEBN reference model. So, the participants in the two groups thought about their models in
similar ways. However, the participants could not be sure whether or not their models were
correct, so they spent relatively more time to think about their structure models and rules. (3)
Finding entity/RV/MFrag from the RDB: The participants in Group A could not be sure of which
elements in the RDB can be entity/RV/MFrag in MEBN, so they used times to figure out this. On
the other hand, the participants in Group B used the HML tool containing MEBN-RM, so they
didn’t consider this step much. (4) Finding CLD: The participants in Group A looked at data to
38
find normal distributions and regression models for RVs, while the participants in Group B used
the MEBN parameter learning built in the HML tool.
6 Conclusion
In this research, we introduced a new development framework for MEBN, providing a
semantically rich representation that also captures uncertainty. MEBN was used to develop
Artificial Intelligence (AI) systems. MEBN models for such systems were constructed manually
with the help of domain experts. This manual MEBN modeling was labor-intensive and
insufficiently agile. To address this problem, we introduced a development framework (HML)
combining machine learning with subject matter expertise to construct MEBN models. We also
presented a MEBN parameter learning for MEBN. In this research, we conducted an experiment
between HML and an existing MEBN modeling process in terms of the development efficiency.
In conclusion, HML could be used to develop more quickly a MEBN model than the existing
approach. Future steps for HML are to apply it to realistic Artificial Intelligence systems. Also,
HML should be more thoroughly investigated in terms of efficiency (agility for the development
of a reasoning model) and effectiveness (producing a correct reasoning model).
C[𝑥𝑥𝑘𝑘 ]
𝜃𝜃𝑘𝑘∗ = , (A.3)
∑N𝑞𝑞=1 C[𝑥𝑥𝑞𝑞 ]
where C[.] is a function returning the number of times a value xk ∈ Val(X) in an RV X appears in
D and N = |Val(X)|. Note that for a variable X, a function Val(X) returns a set of values for X.
For example, suppose that there is an RV X for an observation Di. The RV X contain two values
x1 = H and x2 = T and there is a set of observations D = {H, H, H, T}. On a set of observations,
the count of the number for x1 and x2 is observed using the function C[.]. For example, C[x1] = 3
and C[x2] = 1. Using Equation A.3, we can calculate the maximum likelihood estimates for x1 and
x2 as 𝜃𝜃1∗ = 3/4 and 𝜃𝜃2∗ = 1/4, respectively.
We can use MLE for a Bayesian network (BN) to estimate a parameter of an RV in the BN.
Suppose that there is an RV Xi in the BN, xk is a value for the RV Xi (i.e., xk ∈ Val(Xi)), there is a
set of parent RVs for the RV (i.e., Pa(Xi) = U), and u is some instantiation for the set of parent
RVs (i.e., u ∈ Val(U)). If we assume that each RV Xi is the multinomial distribution and the
observations associated with the RV Xi are independent and identically distributed, then the
maximum likelihood estimator for a value xk|u in the RV Xi in the BN can be formed as Equation
A.4.
40
C[xk , u ]
𝜃𝜃𝑖𝑖∗ x = , (A.4)
k |u ∑nq=1 C�xq , u�
where C[xq , u] is the number of times observation xq in X and its parent observation u in Val(U)
appears in D.
For example, we assume that there are a node X1 in a BN, Val(X1) = {x1 = T, x2 = F}, the set of
parent nodes for X1 (i.e., Pa(X1) = U = {U1}), and Val(U1) = {u1 = A, u2 = B}. Also, there is a
desta set D = {D1 = {T, A}, D2 = {T, A}, D3 = {F, A}}, where the first value in Dk is for X1 and
the second value in Dk is for U1. If u = u1 = A, then 𝜃𝜃1∗ x |u = 2/3.
1 1
P(𝑫𝑫 | 𝜃𝜃)P(𝜃𝜃)
P(𝜃𝜃 | 𝑫𝑫) = , (A.5)
P(𝑫𝑫)
41
P(𝑫𝑫 | 𝜃𝜃)P(𝜃𝜃 | α)
P(𝜃𝜃 | 𝑫𝑫, α) = . (A.6)
∫𝜃𝜃 P(𝑫𝑫 | 𝜃𝜃)P(𝜃𝜃 | α) d𝜃𝜃
We use this posterior distribution containing the hyperparameter (Equation A.6) to compute the
predictive distribution. The predictive distribution is the distribution of a new observation given
past observations.
where Dnew is a new observation and is independent of the past IID observations D given a
parameter θ.
In Equation A.7, the predictive distribution integrates over all parameters for the new observation
and the posterior distributions (Equation A.6). To compute the predictive distribution (Equation
A.7), we should deal with the posterior distribution (Equation A.6) first. If there is no closed form
expression for the integral in the denominator in Equation A.6, we may need to approximate the
posterior distribution. If there is a closed form expression for the integral in the denominator and,
the prior distribution and the likelihood are a conjugate pair, then an exact posterior distribution
can be found.
A probability distribution in the exponential family (e.g., normal, exponential, and gamma) has a
conjugate prior [Gelman et al, 2014]. We can consider an RV X with a categorical probability
distribution. For such a categorical probability distribution, Dirichlet conjugate distribution is
commonly used. Using Dirichlet distribution, the predictive distribution will be a compact form
[Koller & Friedman, 2009].
𝛼𝛼𝑘𝑘 + C[xk ]
P(𝐷𝐷𝑛𝑛𝑛𝑛𝑛𝑛 | 𝑫𝑫, α) = , (A.8)
∑𝑗𝑗 𝛼𝛼𝑗𝑗 + ∑Nq=1 C[xq ]
where C[.] is a function returning the number of times a value xk ∈ Val(X) in a variable X appears
in D and N = |Val(X)|, α is a hyperparameter, and 𝛼𝛼𝑗𝑗 is a sub-hyperparameter in Dirichlet
distribution as shown the following.
𝑁𝑁
𝛼𝛼 𝑗𝑗 −1
𝜃𝜃~Dirichlet (𝛼𝛼1 , 𝛼𝛼2 , … , 𝛼𝛼𝑁𝑁 ) if P(𝜃𝜃) ∝ � 𝜃𝜃𝑗𝑗 , (A.9)
𝑗𝑗
where the sub-hyperparameter 𝛼𝛼𝑗𝑗 is the number of samples which have already happened [Koller
& Friedman, 2009].
The Bayesian approach above can be used for BN parameter learning. If a prior distribution for
an RV Xi, P(𝜃𝜃 𝑖𝑖 | α), is the Dirichlet prior with a hyperparameter α ={αx 1 |u , … , αx N |u }, then the
42
Dirichlet posterior for P( 𝜃𝜃 𝑖𝑖 | α) is P( 𝜃𝜃 𝑖𝑖 | 𝑫𝑫, α ) with a hyperparameter α ={ αx 1 |u + C[x1 ,
u], … , αx N |u + C[ xN , u]}, where a value xk ∈ Val(Xi), u ∈ Val(Pa(Xi) = U), and C[xq , u] is the
number of times observation xq in Xi and its parent observation u in Val(U) appears in D. Using
the Dirichlet posterior, we can derive the predictive distribution for a value of Xi in a BN under
some assumptions: (1) local parameter independences and (2) global parameter independences
[Heckerman et al., 1995].
𝛼𝛼x k |u + C[xk , u]
𝑃𝑃(X i = xk | U = u, 𝑫𝑫, α) = N
, (A.10)
∑q=1(𝛼𝛼x q |u + C�xq , u�)
where N = |Val(Xi)|.
Equation A.10 shows the posterior predictive distribution for the value xk of the i-th RV Xi in the
BN given a parent value u, the observations D, and a hyperparameter α for Dirichlet conjugate
distribution.
6
Researchers around the world can debug and extend MEBN-RM Tool.
7
Github is a distributed version control system (https://github.com).
43
Acknowledgements
The research was partially supported by the Office of Naval Research (ONR), under Contract#:
N00173-09-C-4008. We appreciate Dr. Paulo Costa and Mr. Shou Matsumoto for their helpful
comments on this research.
References
Baier, C. & Katoen, J. P. (2008). Principles of Model Checking (Representation and Mind Series).
The MIT Press.
Boyd, J.R. (1976) COL USAF, Destruction and Creation, in R. Coram. Boyd New York, Little,
Brown & Co, 2002
Boyd, J.R. (1987) COL USAF, in Patterns of Conflict, unpubl. Briefing by COL J.R. Boyd,
USAF.
Carvalho, R. N., Laskey, K. B., & Costa, P. C. (2017). PR-OWL–a language for defining
probabilistic ontologies. International Journal of Approximate Reasoning, 91, 56-79.
Carvalho, R. N., Laskey, K. B., & Da Costa, P. C. (2016). Uncertainty modeling process for
semantic technology (No. e2045v1). PeerJ Preprints.
Codd, E. F. (1969). Derivability, Redundancy, and Consistency of Relations Stored in Large Data
Banks. IBM Research Report.
Codd, E. F. (1970). A Relational Model of Data for Large Shared Data Banks. Communications
of the ACM.
Costa, P. C. G. (2005). Bayesian Semantics for the Semantic Web. PhD Dissertation. George
Mason University.
Costa, P. C. G., Laskey, K. B., Takikawa, M., Pool, M., Fung, F., & Wright, E. J. (2005). MEBN
Logic: A Key Enabler for Network Centric Warfare. In Proceedings of the 178 Tenth
International Command and Control Research and Technology Symposium (10th ICCRTS).
Mclean, VA, USA: CCRP/DOD publications.
Costa, Paulo. C. G., Laskey, Kathryn B., Chang, Kuo-Chu, Sun, Wei, Park, Cheol Y., &
Matsumoto, Shou. (2012). High-Level Information Fusion with Bayesian Semantics.
Proceedings of the Nineth Bayesian Modelling Applications Workshop, held at the Conference
of Uncertainty in Artificial Intelligence (BMAW UAI 2012).
Date, C. J. (2007). Logic and Databases: The Roots of Relational Theory. Trafford publishing.
Date, C. J. (2011). SQL and relational theory: how to write accurate SQL code. O'Reilly Media,
Inc..
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014).
Bayesian data analysis (Vol. 2). Boca Raton, FL: CRC press.
Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.
Journal of the American Statistical Association, 102(477), 359-378.
Golestan, K. (2016). Information Fusion Methodology for Enhancing Situation Awareness in
Connected Cars Environment. PhD Dissertation. University of Waterloo.
Golestan, K., Soua, R., Karray, F., & Kamel, M. S. (2016). Situation awareness within the context
of connected cars: A comprehensive review and recent trends. Information Fusion, 29, 68-83.
Heckerman, D.(1998). A tutorial on learning with Bayesian networks. In M. I. Jordan, editor,
Learning in Graphical Models. MIT Press. Cambridge, MA.
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The
combination of knowledge and statistical data. Machine Learning, 20:197–243.
Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques.
The MIT Press, 1 edition.
Laskey, K. B. (2008). MEBN: A Language for First-Order Bayesian Knowledge Bases. Artificial
Intelligence, 172(2-3).
44
Laskey, K. B., & Mahoney, S. M. (2000). Network engineering for agile belief network models.
Knowledge and Data Engineering, IEEE Transactions on, 12(4), 487-498.
Laskey, K. B., D’Ambrosio, B., Levitt, T. S., & Mahoney, S. M. (2000). Limited Rationality in
Action: Decision Support for Military Situation Assessment. Minds and Machines, 10(1), 53-
77.
Li, X., Martínez, J., Rubio, G., & Gómez, D. (2016). Context Reasoning in Underwater Robots
Using MEBN. The Third International Conference on Cloud and Robotics (ICCR 2016).
Park, C. Y., Laskey, K. B., Costa, P. C. G., & Matsumoto, S. (2013). Multi-Entity Bayesian
Networks Learning In Predictive Situation Awareness. Proceedings of the 18th International
Command and Control Technology and Research Symposium (ICCRTS 2013).
Park, C. Y., Laskey, K. B., Costa, P. C., & Matsumoto, S. (2014, July). Predictive situation
awareness reference model using multi-entity bayesian networks. In Information Fusion
(FUSION), 2014 17th International Conference on (pp. 1-8). IEEE.
Park, C. Y., Laskey, K. B., Costa, P. C., & Matsumoto, S. (2016). A Process for Human-aided
Multi-Entity Bayesian Networks Learning in Predictive Situation Awareness. In Information
Fusion (FUSION).
Park, C. Y., Laskey, K. B., Salim, S., & Lee, J. Y. (2017). Predictive Situation Awareness Model
for Smart Manufacturing. In Information Fusion (FUSION).
Patnaikuni, P., Shrinivasan, R., & Gengaje, S. R. (2017). Survey of Multi Entity Bayesian
Networks (MEBN) and its applications in probabilistic reasoning. International Journal of
Advanced Research in Computer Science, 8(5).
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
San Mateo, CA, USA: Morgan Kaufmann Publishers.
Reiter, R. (1978). On closed world data bases (pp. 55-76). Springer US.
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Smith, B. (2003). Ontology. Retrieved September 2, 2016from
http://ontology.buffalo.edu/smith/articles/ontology_pic.pdf.
Sun, W., Chang, K. C, & Laskey, K. B. (2010). Scalable Inference for Hybrid Bayesian Networks
with Full Density Estimation. Proceedings of the 13th International Conference on Information
Fusion (FUSION 2010).
Suzic, R. (2005, March). A generic model of tactical plan recognition for threat assessment. In
Defense and Security (pp. 105-116). International Society for Optics and Photonics.
Wright, E., Mahoney, S. M., Laskey, K.B., Takikawa, M. & Levitt, T. (2002).Multi-Entity
Bayesian Networks for Situation Assessment. Proceedings of the Fifth International
Conference on Information Fusion.
45