Phylogenetic Trees Bulent Moller CSE 397 18 March 2004
Phylogenetic Trees Bulent Moller CSE 397 18 March 2004
Phylogenetic Trees Bulent Moller CSE 397 18 March 2004
Bulent Moller
CSE 397
18 March 2004
Outline
Distance Matrix
c5
c2
c3 C6
B D E A
C
Perfect Phylogeny Problem
Instance: A set O with n objects, a set C
of m characters, each character having
at most r states (n, m, r positive
integers)
Question: Is there a perfect phylogeny
for O?
If the character state matrix admits a
perfect phylogeny we say that the
defining characters are compatible
Perfect Phylogeny Problem
A, C , D , F B, E
Oc3
A, C D,F
Oc1 = { A, C, D , F }
Oc2 = { B, E }
k=2
Oc3 = { A, C }
Sketch cont.
case 2: character k + 1 partitions object
sets belonging to different nodes
THIS CANNOT HAPPEN
Assume it did, it can only happen if
there exist a character i such that leads
the objects in node a and b in different
nodes. This is the case that Oi and
Ok+1 are whether disjoint nor one is
contained by the other.
Ex:
Oi = { A, C, E }
A, B, C , D , E , F
Ok+1 = { A, B }
Oi
A, C , E B, D , F
E A, C
Ok+
Ok+ 1
1
A, B
Algorithms
For Simplicity we assume that the
Phylogenetic tree construction
works in 2 phases
Decision
Construction
Algorithms for Decisions
The very basic Algorithm:
Check if the input Matrix obeys
Lemma
How would you do that?
Basic Decision Algorithm
Check every
column pair of
being disjoint or if
one is the subset of
the other
One of these
checks costs us O
(n) we have m
column pairs
O(nm)
Decision Algorithms
Improvement
Visit every column only once to have
Complexity O(nm)
Process first characters for which the
maximum number of objects has state 1
All other characters are either subsets
admits perfect
pylogeny false
otherwise
//Sort column based on
#1's
//Initialize auxiliary matrix
L
for each Lij do
Lij 0
Algorithms Perfect
Phylogeny Decision
for i 1 to n do
k -1
for j 1 to m do
if Mij = 1 then
Lij k
k j
Algorithms Perfect
Phylogeny
Decision
for each column j of
L do
If Lij Lmj for
some i, m and
both Lij and Lmj
are both non
zero then return
false
return true
Algorithms Perfect
Phylogeny
Construction
Input: binary matrix M with
curNode root
For 1 to m do
If Mij = 1 then
If there already exits edge (curNode, u)
labeled j then curNode u
else Create node u, Create edge( curNode,
u) labeled j, curNode u
Place i in curNode
C2 C4
C3
C2 C4
Compatibility is NP
Complete cont.
G contains a clique V', with |V'|K iff M
contains a compatible character subset L
with |L|K
If such a clique exists, then to every edge of this
clique there corresponds a pair of characters in
M, such that whenever one of them has state 1
for an object, the other has state 0 or both have
0.
If L exists, then to every pair of characters of L
there corresponds a pair of vertices in V linked by
an edge. All this pairs together form a clique K