Reconstruction of Markov Random Fields From Samples: Some Observations and Algorithms
Reconstruction of Markov Random Fields From Samples: Some Observations and Algorithms
Reconstruction of Markov Random Fields From Samples: Some Observations and Algorithms
Abstract
Markov random fields are used to model high dimensional distributions in a number
of applied areas. Much recent interest has been devoted to the reconstruction of the
dependency structure from independent samples from the Markov random fields. We
analyze a simple algorithm for reconstructing the underlying graph defining a Markov
random field on n nodes and maximum degree d given observations. We show that
under mild non-degeneracy conditions it reconstructs the generating graph with high
probability using Θ(dǫ−2 δ −4 log n) samples where ǫ, δ depend on the local interactions.
For most local interaction ǫ, δ are of order exp(−O(d)).
Our results are optimal as a function of n up to a multiplicative constant depending
on d and the strength of the local interactions. Our results seem to be the first results
for general models that guarantee that the generating model is reconstructed. Further-
more, we provide explicit O(nd+2 ǫ−2 δ −4 log n) running time bound. In cases where the
measure on the graph has correlation decay, the running time is O(n2 log n) for all fixed
d. We also discuss the effect of observing noisy samples and show that as long as the
noise level is low, our algorithm is effective. On the other hand, we construct an exam-
ple where large noise implies non-identifiability even for generic noise and interactions.
Finally, we briefly show that in some simple cases, models with hidden nodes can also
be recovered.
1 Introduction
In this paper we consider the problem of reconstructing the graph structure of a Markov
random field from independent and identically distributed samples. Markov random fields
(MRF) provide a very general framework for defining high dimensional distributions and
the reconstruction of the MRF from observations has attracted much recent interest, in
particular in biology, see e.g. [9] and a list of related references [10].
∗
Department of Electrical Engineering and Computer Sciences, U.C. Berkeley. Email:
[email protected]. Supported by a Vodafone US-Foundation fellowship.
†
Dept. of Statistics and Dept. of Electrical Engineering and Computer Sciences, U.C. Berkeley. E-
mail: [email protected]. Supported by a Sloan fellowship in Mathematics, by NSF Career award
DMS-0548249, NSF grant DMS-0528488 and ONR grant N0014-07-1-05-06
‡
Dept. of Statistics, U.C. Berkeley. Email: [email protected] Supported by NSF grants DMS-
0528488 and DMS-0548249
1
1.1 Our Results
We give sharp, up to a multiplicative constant, estimates for the number of independent
samples needed to infer the underlying graph of a Markov random field of bounded degree.
In Theorem 1 we use a simple information-theoretic argument to show that Ω(d log n)
samples are required to reconstruct a randomly selected graph on n vertices with maximum
degree at most d. Then in Theorems 2 and 3 we propose two algorithms for reconstruction
that use only O(dǫ−2 δ−4 log n) where ǫ and δ are lower bounds on marginal distributions in
the neighbourhood of a vertex. Under mild non-degeneracy conditions ǫ, δ = exp(−O(d))
and for some models ǫ, δ = poly−1 d. Examples of the later model include the hardcore
model with fugacity λ = Θ( d1 ). Our main focus is on the reconstruction of sparse MRFs
case where d is fixed in which case ǫ and δ are constant. The two theorems differ in their
running time and the required non-degeneracy conditions. It is clear that non-degeneracy
conditions are needed to insure that there is a unique graph associated with the observed
probability distribution.
In addition to the fully-observed setting in which samples of all variables are available,
we extend our algorithm in several directions. In Section 5 we consider the problem of noisy
observations. In subsection 5.1 we show by way of an example that if some of the random
variables are perturbed by noise then it is in general impossible to reconstruct the graph
structure with probability approaching 1. Conversely, when the noise is relatively weak as
compared to the coupling strengths between random variables, we show that the algorithms
used in Theorems 2 and 3 reconstruct the graph with high probability. Furthermore, we
study the problem of reconstruction with partial observations, i.e. samples from only a
subset of the nodes are available. In Theorem 5 we provide sufficient conditions on the
probability distribution for correct reconstruction.
Chickering [2] showed that maximum-likelihood estimation of the underlying graph of
a Markov random field is NP-complete. This does not contradict our results which assume
that the data is generated from a model (or a model with a small amount of noise). Although
the algorithm we propose runs in time polynomial in the size of the graph, the dependence
on degree (the run-time is O(nd+2 ǫ−2 δ−4 log n)) may impose too high a computational
cost for some applications. Indeed, for some Markov random fields exhibiting a decay of
correlation a vast improvement can be realized: a modified version of the algorithm runs in
time O(dn2 ǫ−2 δ−4 log n). This is proven in Theorem 4.
2
apply. These are summarized in the Table below. The first line refers to the type of models
that the method cover: Does the model allow clique interactions of just edge interactions?
The next two lines refer to requirements on the strength of interactions: are they not
required to be too weak / are only edges with strong interactions returned? are they not
required to be too strong? The next line refers to the hardness of verifying if a given
model satisfies the conditions of the algorithm (where X denoted that the verification is
exponential in the size of the model). The following line refers to the following question:
is there a guarantee that the generating model is returned with high probability. The final
two lines refers to computational and sampling complexity where cd denotes constants that
depend on d.
Method AKN [3] WRL [5] Alg High Temp Alg
√ √ √
Cliques X
√
No Int. Low. Bd. X X X
√ √
No Int. Upp. Bd. X X
√ √ √
Verifiable Conds. X
√ √ √
Output Gen. Model X
Comp. Compl. nO(d) n5 nO(d) cd n2 log n
Sampl. Compl. nO(d) poly(d) log n cd log n cd log n
Abbeel, et al [3] considered the problem of reconstructing graphical models based on
factor graphs, and proposed a polynomial time and sample complexity algorithm. However,
the goal of their algorithm was not to reconstruct the true structure, but rather to produce a
model whose distribution is close in Kullback-Leibler divergence to the true distribution. In
applications it is often of interest to reconstruct the true structure which give some insights
into the underlying structure of the inferred model.
Note furthermore that two networks that differ only in the neighborhood of one node
will have O(1) KL distance. Therefore, even in cases where it is promised that the KL
distance between the generating distribution and any other distribution defined by another
graph is as large as possible, the lower bounds on the KL distance is Ω(1). Plugging this
into the bounds in [3] yields a polynomial sampling complexity in the size of the network
in order to find the generating network compared to our logarithmic sampling complexity.
For other work based on minimizing the KL divergence see the references in [3].
The same problem as in the present work (but restricted to the Ising model) was studied
by Wainwright, et al [5], where an algorithm based on ℓ1 -regularization was introduced.
The algorithm presented is efficient also for dense graphs with running time O(n5 ) but is
applicable only in very restricted settings. The work only applies to the Ising model and
more importantly only models with edge interactions (no larger cliques are allowed). The
most important restrictions are the two conditions in the paper (A1 and A2). Condition A1
requires (among other things) that the “covariates [spins] do not become overly dependent”.
Verifying when the conditions holds seems hard. However, it is easy to see that this condition
fails for standard models such as the Ising model on the lattice or √ on random d-regular
graphs when the model is at low temperatures, i.e. for β > 21 log(1 + 2) in the case of the
two dimensional Ising model and β > tanh−1 1/(d − 1) for random d-regular graphs.
Subsequent to our work being posted on the Arxiv, Santhanam and Wainwright [4]
again considered essentially the problem for the Ising model, producing nearly matching
lower and upper bounds on the asymptotic sampling complexity. Again their conditions do
not apply to the low temperature regime. Another key difference from our work is that they
3
restrict attention to the Ising model, i.e. Markov random fields with pairwise potentials and
where each variable takes two values. Our results are not limited to pairwise interactions
and apply to the more general setting of MRFs with potentials on larger cliques.
2 Preliminaries
We begin with the definition of Markov random field.
when W, U and S are disjoint subsets of V such that every path in G from W to U passes
through S and where X(U ) denotes the restriction of X from AV to AU for U ⊂ V .
where Z is a normalizing constant, a ranges over the cliques in G, and Ψa : A|a| → R∪{−∞}
are functions called potentials.
The problem we consider is that of reconstructing the graph G, given k independent
samples X = {X 1 , . . . , X k } from the model. Denote by Gd the set of labeled graphs with
maximum degree at most d. We assume that the graph G ∈ Gd is from this class. A
structure estimator (or reconstruction algorithm) G b : Akn → Gd is a map from the space
of possible sample sequences to the set of graphs under consideration. We are interested in
the asymptotic relationship between the number of nodes in the graph, n, the maximum
degree d, and the number of samples k that are required. An algorithm using number of
samples k(n) is deemed successful if in the limit of large n the probability of reconstruction
error approaches zero.
Theorem 1. Let the graph G be drawn according to the uniform distribution on Gd . Then
there exists a constant c = c(A) > 0 such that if k ≤ cd log n then for any estimator
G b = G) = o(1).
b : X → Gd , the probability of correct reconstruction is P (G
Remark 1. Note that the theorem above doesn’t need to assume anything about the poten-
tials. The theorem applies for any potentials that are consistent with the generating graph.
In particular, it is valid both in cases where the graph is “identifiable” given many samples
and in cases where it isn’t.
4
b be the
Proof. To begin, we note that the probability of error is minimized by letting G
maximum a posteriori (MAP) decision rule,
bMAP (X) = argmaxg∈G P [G = g|X].
G
By the optimality of the MAP rule, this bounds the probability of error using any estimator.
Now, the MAP estimator G bMAP (X) is a deterministic function of X. Clearly, if a graph g
b
is not in the range of G then the algorithm always makes an error when G = g. Let S be
the set of graphs in the range of GbMAP , so P (error|g ∈ S c ) = 1. We have
X
P (error) = P (error|G = g)P (G = g)
g∈G
X X
= P (error|G = g)P (G = g) + P (error|G = g)P (G = g)
g∈S g∈S c
X X (3)
≥ P (G = g) = 1 − |G|−1
g∈S c g∈S
Ank
≥1− ,
|G|
where the last step follows from the fact that |S| ≤ |X| ≤ Ank . It remains only to express
the number of graphs with max degree at most d, |Gd |, in terms of the parameters n, d. The
following lemma gives an adequate bound.
Lemma 1. Suppose d ≤ nα with α < 1. Then the number of graphs with max degree at
most d, |Gd |, satisfies
log |Gd | = Ω(nd log n). (4)
Proof. To make the dependence on n explicit, let Un,d be the number of graphs with n
vertices with maximum degree at most d. We first bound Un+2,d in terms of Un,d, . Given a
graph G with n vertices and degree at most d, add two vertices aand b. Select d distinct
neighbors v1 , . . . , vd for vertex a, with d labeled edges; there are nd d! ways to do this. If vi
already has degree d in G, then vi has at least one neighbor u that is not a neighbor of a,
since there are only d − 1 other neighbors of a. Remove the edge (vi , u) and place an edge
labeled i from vertex b to u. This is done for each vertex v1 , . . . , vd , so b has degree at most
d. The graph G can be reconstructed from the resulting labeled graph on n + 2 vertices as
follows: remove vertex a, and return the neighbors of b to their correct original neighbors
(this is possible because the edges are labeled).
Removing the labels on the edges from a and b sends at most d!2 edge-labeled graphs
of this type on n + 2 vertices to the same unlabeled graph. Hence, the number of graphs
with max degree d on n + 2 vertices is lower bounded as
n 1 n 1
Un+2,d ≥ Un,d d! 2 = Un,d .
d d! d d!
5
If n is odd, it suffices to note that Un+1,d ≥ Un,d . Taking the logarithm of equation (5)
yields
log Un,d = Ω(nd(log n − log d)) = Ω(nd log n), (6)
assuming that d ≤ nα with α < 1.
Together with equation (3), Lemma 1 implies that for small enough c, if the number of
samples k ≤ cd log n, then
Ank
P (error) ≥ 1 − = 1 − o(1).
|G|
4 Reconstruction
We now turn to the problem of reconstructing the graph structure of a Markov random
field from samples. For a vertex v we let N (v) = {u ∈ V − {v} : (u, v) ∈ E} denote the set
of neighbors of v. Determining the neighbors of v for every vertex in the graph is sufficient
to determine all the edges of the graph and hence reconstruct the graph. We test each
candidate neighborhood of size at most d by using the Markov property, which states that
for each w ∈ V − (N (v) ∪ {v})
We give two theorems for reconstructing networks; they differ in their non-degeneracy
conditions and their running time. The first one, immediately below, has more stringent
non-degeneracy conditions and faster running time.
and
|P (X(U ) = xU , X(w) = xw )| > δ,
(9)
P (X(u1 ) = xU , X(w) = x′w ) > δ.
Then with the constant C = 81(d+2)
ǫ2 δ4 2d
+ C 1 , when k > Cd log n, there exists an estimator
Remark 2. Condition (8) captures the notion that each edge should have sufficient strength.
Condition (9) is required so that we can accurately calculate the empirical conditional
probabilities.
6
Proof. Let Pb denote the empirical probability measure from the k samples. Azuma’s in-
equality gives that if Y ∼ Bin(k, p) then
for all {ui }li=1 and {xi }li=1 . If we additionally have l ≤ d + 2 and k ≥ C(γ)d log n, then
2
equation (11) holds with probability at least 1 − Ad+2 nd+2 2/n2γ C(γ)d . Choosing C(γ) =
d+2 d+2 /nC1 .
γ 2 2d + C1 , equation (11) holds with probability at least 1 − 2A
For the remainder of the proof assume (11) holds. Taking
1 1
+ −
b
P (X(U ) = xU ) P (X(U ) = xU )
γ γ ǫδ2 ǫδ2 ǫδ ǫ ǫ
≤ + ≤ + 2 = + < . (13)
δ (δ − γ)δ 9δ 9(δ − ǫδ9 )δ 9 (9 − ǫδ) 4
For each vertex v ∈ V we consider all candidate neighborhoods for v, subsets U ⊂ V −{v}
with |U | ≤ d. The estimate (13) and the triangle inequality imply that if N (v) ⊆ U then
by the Markov property,
Conversely by conditions (8) and (9) and the estimate (13), we have that for any U with
N (v) * U there exists some w ∈ V and xu1 , . . . , xul , xw , x′w , xv ∈ A such that equation (15)
7
holds but equation (14) does not hold. Thus, choosing the smallest set U such that (14)
holds gives the correct neighborhood.
To summarize, with number of samples
81(d + 2)
k= + C1 d log n
ǫ2 δ4 2d
The analysis of the running time is straightforward. There are n nodes, and for each
node we consider O(nd ) neighborhoods. For each candidate neighborhood, we check ap-
proximately O(n) nodes and perform a correlation test of complexity O(log n).
to be the assignment obtained from xU by replacing the ith element by x′ui . Suppose there
exist ǫ, δ > 0 such that the following condition holds: for all v ∈ V , if N (v) = u1 , . . . , ul ,
then for each i, 1 ≤ i ≤ l and for any set W ⊂ V − (v ∪ N (v)) with |W | ≤ d there exist
values xv , xu1 , . . . , xui , . . . , xul , x′ui ∈ A and xW ∈ A|W | such that
and
P (X(N (v)) = xN (v) , X(W ) = xW ) > δ,
(17)
P (X(N (v)) = xiN (v) , X(W ) = xW ) > δ.
Then for some constant C = C(ǫ, δ) > 0, if k > Cd log n then there exists an estimator
b
G(X) b
such that the probability of correct reconstruction is P (G = G(X)) = 1 − o(1). The
b
estimator G is computable in time O(n 2d+1 log n).
8
Proof. As in Theorem 2 we can assume that with high probability we have
for all {ui }li=1 and {xi }li=1 when l ≤ 2d + 1 and k ≥ C(γ)d log n so we assume that (18)
holds. For each vertex v ∈ V we consider all candidate neighborhoods for v, subsets
U = {u1 , . . . , ul } ⊂ V − {v} with 0 ≤ l ≤ d. For each candidate neighborhood U , the
algorithm computes a score
f (v; U ) =
min max ′ Pb(X(v) = xv |X(W ) = xW , X(U ) = xU )
W,i xv ,xW ,xU ,xu
i
where for each W, i, the maximum is taken over all xv , XW , xU , x′ui , such that
by the Markov property (7). Assuming that equation (18) holds with γ chosen as in (12), the
estimation error in f (v; U ) is at most ǫ/2 by equation (20), and it holds that f (v; U ) < ǫ/2
for each U * N (v). Thus all U * N (v) are rejected. If U = N (v), then by the Markov
property (7) and the conditions (16) and (17), for any i and W ⊂ V ,
for some xv , xW , xU , x′ui . The error in f (v; U ) is less than ǫ/2 as before, hence f (v; U ) > ǫ/2
for U = N (v). Since U = N (v) is the largest set that is not rejected, the algorithm correctly
determines the neighborhood of v for every v ∈ V when (18) holds.
9
To summarize, with number of samples
81(2d + 1)
k= + C1 d log n
ǫ2 δ4 2d
Proposition 1 (Models with soft constraints). In a graphical model with maximum degree
d given by equation (2) suppose that all the potentials Ψuv satisfy kΨuv k∞ ≤ K and
max |Ψuv (x1 , x2 ) − Ψuv (x3 , x2 ) − Ψuv (x1 , x4 ) + Ψuv (x3 , x4 )| > γ, (21)
x1 ,x2 ,x3 ,x4 ∈A
for some γ > 0. Then there exist ǫ, δ > 0 depending only on d, K and γ such that the
hypothesis of Theorem 3 holds.
Proof. It is clear that for some sufficiently small δ = δ(d, m, K) > 0 we have that for all
u1 , . . . , u2d+1 ∈ V and xu1 , . . . , xu2d+1 ∈ A that
Now suppose that u1 , . . . , ul is the neighborhood of v. Then for any 1 ≤ i ≤ l it follows from
equation (21) that there exists xv , x′v , xui , x′ui ∈ A such that for any xu1 . . . , xui−1 , xui+1 , . . . , xul ∈
A,
Although the results to follow hold more generally, for ease of exposition we will keep
in mind the example of the Ising model with no external magnetic field,
1 X
P (~x) = exp βuv xu xv , (23)
Z
(u,v)∈E
10
Proposition 2. Consider the Ising model with all parameters satisfying
on a graph G with max degree at most d . Then the conditions (16) and (17) of Theorem 3
are satisfied with
tanh(2c)
ǫ≥
2C 2 + 2C −2
and
e−4dC
δ ≥ 2d .
2
Proof. Fix a vertex v ∈ V and let w ∈ N (v) be any vertex in the neighborhood of v. Let
R = N (v) \ {w} be the other neighbors of v. Then
Defining
X
A := exp xj βjv ,
j∈R
It is possible to choose the spins xR in such a way that e−C < A < eC . Thus the expression
above is at least
tanh(2c)
.
2e + 2e−2C
2C
Moreover, the probability of any assignment of 2d spins can be very crudely bounded as
e−4dC
P (X(i1 ) = xi1 , . . . , X(i2d ) = xi2d ) ≥ .
22d
11
4.4 O(n2 log n) Algorithm For Models with Correlation Decay
The reconstruction algorithm runs in polynomial time O(dn2d+1 ln n). It would be desirable
for the degree of the polynomial to be independent of d and this can be achieved for Markov
random fields with exponential decay of correlations. For two vertices u, v ∈ V let d(u, v)
denote the graph distance and let dC (u, v) denote the correlation between the spins at u
and v defined as
X
dC (u, v) = |P (X(u) = xu , X(v) = xv ) − P (X(u) = xu )P (X(v) = xv )| .
xu ,xv ∈A
If the interactions are sufficiently weak the graph will satisfy the Dobrushin-Shlosman con-
dition (see e.g. [8]) and there will be exponential decay of correlations between vertices.
Theorem 4. Suppose that G and X satisfy the hypothesis of Theorem 3 and that for
all u, v ∈ V , dC (u, v) ≤ exp(−αd(u, v)) and there exists some κ > 0 such that for all
(u, v) ∈ E, dC (u, v) > κ. Then for some constant C = C(α, κ, ǫ, δ) > 0, if k > Cd log n
b
then there exists an estimator G(X) such that the probability of correct reconstruction is
d ln(4/κ)
b
P (G = G(X)) = 1 − o(1) and the algorithm runtime is O(nd α + dn2 ln n) with high
probability.
Proof. Denote the correlation neighborhood of a vertex v as NC (v) = {u ∈ V : dc C (u, v) >
c
κ/2} where dC (u, v) is the empirical correlation of u and v. For large enough C with high
ln(4/κ)
probability for all v ∈ V we have that N (v) ⊆ NC (v) ⊆ {u ∈ V : d(u, v) ≤ α }. Now
ln(4/κ)
ln(4/κ)
the size of |{u ∈ V : d(u, v) ≤ α≤d }| which is independent of n.
α
12
5.1 An Example of Non-Identifiability
Without assumptions on the underlying model or noise, the Markov random field is not in
general identifiable. In other words, a single probability distribution might correspond to
two different graph structures. Thus, the problem of reconstruction is not well-defined in
such a case. The next example shows that even in the Ising model, under unknown noise it
is impossible to distinguish between a graph with 3 vertices and 2 edges and a graph with
3 vertices and 3 edges.
Example 1. Let V = {v1 , v2 , v3 } be a set of 3 vertices and let G and G e be two graphs with
vertex set V and edge sets {(u1 , u2 ), (u1 , u3 )} and {(u1 , u2 ), (u1 , u3 ), (u2 , u3 )} respectively.
Let P and Pe be Ising models on G and G e with edge interactions β12 , β13 and βf f f
12 , β13 , β23
respectively, i.e.
1
P [X] = exp (β12 X(u1 )X(u2 ) + β13 X(u1 )X(u3 ))
Z
1
Pe[X] = exp βe12 X(u1 )X(u2 ) + βe13 X(u1 )X(u3 ) + βe23 X(u2 )X(u3 ) .
Z
Suppose that X ′ (u1 ), a noisy version of the spin X(u1 ), is observed which is equal to X(u1 )
with probability p and −X(u1 ) with probability 1− p for some random unknown p while the
spins X(u2 ) and X(u3 ) are observed perfectly. This is equivalent to adding a new vertex
u′1 to G and Ge with an extra edge (u1 , u′ ) and potential Ψ(u ,u′ ) = β11′ X(u1 )X(u′ ). The
1 1 1 1
′
spin at u1 then represents the noisy observation of the spin at u1 . Suppose that all the β
and βe are chosen independently with N (0, 1) distribution and let P and Pe be the random
noisy distributions on A{u1 ,u2 ,u3 } . Then the total variation distance between P and Pe is
′
less than 1 and so the graph structure is not identifiable as we shall show below.
By the symmetry of the Ising model with no external field the random element P can
be parameterized by (p1′ 2 , p1′ 3 , p23 ) ∈ [0, 1]3 where p1′ 2 = P (Xu′1 = 1, Xu2 = 1), p1′ 3 =
P (Xu′1 = 1, Xu3 = 1), p23 = P (Xu2 = 1, Xu3 = 1). These parameters are given by
13
Theorem 5. Suppose that the hypothesis of Theorem 3 holds for some Markov random field
X based on a triangle-free graph with minimum degree at least 3 and maximum degree d′ .
Let V ∗ ⊆ V such that for any two points v, v ′ ∈ V − V ∗ we have d(v, v ′ ) ≥ 3 and suppose
we are given samples from X ∗ , the restriction of X to V ∗ with which to reconstruct G.
Suppose the following condition also holds: for all v ∈ V if v1 , v2 ∈ N (v) and U =
N (v) ∪ N (v1 ) − {v, v1 , v2 } and W ⊂ V − (N (v) ∪ N (v1 )) with |W | ≤ 2d then there exists
some xv1 , xv2 , x′v2 , xU , xW such that
Then for some constant C = C(ǫ, δ) > 0, if k > Cd log n then there exists an estimator
b ∗ ) such that the probability of correct reconstruction is P (G = G(X
G(X b ∗ )) = 1 − o(1).
Proof. We apply the algorithm from Theorem 3 to X ∗ setting the maximum degree as
d = 2d′ . The algorithm will output the graph G∗ = (V ∗ , E ∗ ). If v, N (v) ⊂ V ∗ then the
algorithm correctly reconstructs the neighborhood N (v). Any vertex in V ∗ is adjacent to at
most one missing vertex so suppose that v1 is a vertex adjacent to a missing vertex v. Then
by condition (25) and (26) we have that the algorithm reconstructs the neighborhood of v1
as N (v)∪N (v1 )−{v, v1 }. So the edge set E ∗ is exactly all the edges in the induced subgraph
of V ∗ plus a clique connecting all the neighbors of missing vertices. Since G is triangle-free
every maximal clique (a clique that cannot be enlarged) of size at least 3 corresponds to a
missing vertex.
So to reconstruct G from G∗ we simply replace every maximal clique in G∗ with a vertex
connected to all the vertices in the clique. This exactly reconstructs the graph with high
probability.
Remark 3. The condition that missing vertices are at distance at least 3 is not necessary,
but this assumption simplifies the algorithm because the cliques corresponding to missing
vertices are disjoint. A slightly more involved algorithm is able to reconstruct graphs where
the missing vertices have d(v, v ′ ) = 2.
The following lemma shows that the conditions for recovery of missing vertices in Theo-
rem 5 are satisfied for a ferromagnetic Ising model satisfying the assumptions of Lemma 2.
Lemma 2. Consider the ferromagnetic Ising model where all coupling parameters satisfy
on a triangle-free graph G with minimum degree 3. Then the conditions of Theorem 5 are
satisfied with
tanh(2c)
ǫ≥ 2(d+1)C
,
32e (C 2 + C −2 )
and
e−4dC
δ≥ .
22d
14
Proof. To check the first condition we write
where N = N (v)∪N (v1 )−{v, v1 , v2 } and where the last step follows by the Markov property
(since all paths from v1 to v2 pass through vertices in N or through v). Continuing, we
have that the above is equal to
e−4dC
δ≥ .
22d
Acknowledgment E.M. thanks Marek Biskup for helpful discussions on models with
hidden variables.
References
[1] C. K. Chow and C. N. Liu. Approximating discrete probability distributions with
dependence trees. IEEE Trans. Info. Theory, IT-14:462-467, 1968.
15
[3] P. Abbeel, D. Koller, A. Ng. Learning factor graphs in polynomial time and sample
complexity. Journal of Machine Learning Research. 7 (2006) 1743-1788.
[9] N. Friedman. Infering cellular networks using probalistic graphical models. Science,
February 2004.
[11] Constantinos Daskalakis, Elchanan Mossel, and Sébastien Roch. Optimal phylogenetic
reconstruction. In STOC’06: Proceedings of the 38th Annual ACM Symposium on
Theory of Computing, pages 159–168, New York, 2006. ACM.
[12] P. L. Erdös, M. A. Steel, L. A. Székely, and T. A. Warnow. A few logs suffice to build
(almost) all trees (part 1). Random Struct. Algor., 14(2):153–184, 1999.
[13] E. Mossel. Distorted metrics on trees and phylogenetic forests. IEEE/ACM Trans.
Comput. Bio. Bioinform., 4(1):108–116, 2007.
16