Phylogenetic Trees Bulent Moller CSE 397 18 March 2004

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 43

PHYLOGENETIC TREES

Bulent Moller
CSE 397
18 March 2004
Outline

Recall Phylogenetic trees


Character states and the
perfect Phylogeny problem
Binary Character states
Compatibility is NP Complete
Recall
Motivation
The problem of explaining the
evolutionary history of today's species
How do species relate to one another
in terms of common ancestors
Nucleic acids and Proteins also evolve
Approaches
Fossil Records , Phylogenetic Trees
Recall
In Phylogenetic
trees
Leaves
represent
present day
species
Interior nodes
represent
hypothesized
ancestors
Features of Phylogenetic
Trees
Shows how interior nodes connect
to one another and to the leaves,
What does it tell to the biologist?
Shows the distance between pairs
of nodes when the tree edges are
weighted
What does it tell to the biologist?
Input data for Phylogenetic
Reconstruction

Distance Matrix

Character State Matrix


Character State Matrix
A character has a finite number of
states
Taxonomical units for which we want
to create phylogeny are called
Objects
e.g. species, population
Every object has a state vector &
inherit the same characters but not
the same states!
Character State Matrix M
M has n rows
(Objects)
M has m columns
(characters)
Mij denotes the
state object i has
for character j
Problems while
constructing Phylogenetic
Trees
Convergence or Parallel evolution
e.g. Presence of Wings in Birds and
Bats
Reversals
e.g. Snakes
Unordered characters
Assumptions
There is no Convergence
There is no Reversal
Characters will be ordered
0 to 1
Our Character state Matrix will be
Binary
Perfect Phylogeny Tree
Defn: A tree has perfect phylogeny if

For each state s of each character c, the


set of all nodes u for which the state is s
with respect to c must form a sub tree of
T. In Particular, the edge e leading to this
sub tree is uniquely associated with a
transition from some state w to state s
OBEY OUR ASSUMPTIONS
Ex: Perfect Phylogeny tree
c1 c4

c5
c2
c3 C6

B D E A
C
Perfect Phylogeny Problem
Instance: A set O with n objects, a set C
of m characters, each character having
at most r states (n, m, r positive
integers)
Question: Is there a perfect phylogeny
for O?
If the character state matrix admits a
perfect phylogeny we say that the
defining characters are compatible
Perfect Phylogeny Problem

Can we determine for every problem


(input) the root?

No, we may not have enough


information
Tree will be unrooted !
Ex: Unrooted Binary
Tree
Unrooted Binary
tree do not imply
a known ancestral
root.
This Tree has 3
possible rooted
binary Trees with
one common
ancestor
Ex: Unrooted Binary
Tree
Binary Character States
Defn: For each Column j of M, let Oj
be the set of objects whose state
is 1 for j. Let Oj be the set of
objects whose state is 0 for j.
Oc1 =?
Oc1=?
Binary Character States
Defn: For each Column j of M, let Oj
be the set of objects whose state
is 1 for j. Let Oj be the set of
objects whose state is 0 for j.
Oc1 ={B,D}
Oc1=?
Binary Character States
Defn: For each Column j of M, let Oj
be the set of objects whose state
is 1 for j. Let Oj be the set of
objects whose state is 0 for j.
Oc1 ={B,D}
Oc1={A,C,E}
Lemma
A binary Matrix M admits a perfect
phylogeny if and only if for each
pair of characters i and j the sets
Oi and Oj are disjoint or one of
them contains each other
Sketch
We will show the only if part of
lemma by inductively building a
rooted perfect phylogeny.
Assume we have only 1 character as shown
in the matrix
Sketch cont.
According to the given matrix Oc1 = {B,D} and
Oc1 = { A, C, E}
Create a root and nodes Oc1, Oc1

Link node Oc1 to the root by labeling

the edge with c1 and Oc1 w/o


labeling
Sketch cont.
According to the given matrix Oc1 = {B,D} and
Oc1 = { A, C, E}
Create a root and nodes Oc1, Oc1

Link node Oc1 to the root by labeling

the edge with c1 and Oc1 w/o


labeling
Split each child of the root

into as many leaves as there


are objects in the nodes
Sketch cont.
Consider we have built a tree T for k
characters
There are no leaves, nodes still contain set
of objects
process character k + 1
case 1: character k + 1 partitions only
object sets belonging to the same node
We do not hurt our perfect phylogeny
property
Ex:
A, B, C , D , E , F
c1
c2

A, C , D , F B, E
Oc3

A, C D,F
Oc1 = { A, C, D , F }
Oc2 = { B, E }
k=2
Oc3 = { A, C }
Sketch cont.
case 2: character k + 1 partitions object
sets belonging to different nodes
THIS CANNOT HAPPEN
Assume it did, it can only happen if
there exist a character i such that leads
the objects in node a and b in different
nodes. This is the case that Oi and
Ok+1 are whether disjoint nor one is
contained by the other.
Ex:
Oi = { A, C, E }
A, B, C , D , E , F
Ok+1 = { A, B }

Oi

A, C , E B, D , F

E A, C
Ok+
Ok+ 1
1
A, B
Algorithms
For Simplicity we assume that the
Phylogenetic tree construction
works in 2 phases
Decision
Construction
Algorithms for Decisions
The very basic Algorithm:
Check if the input Matrix obeys
Lemma
How would you do that?
Basic Decision Algorithm
Check every
column pair of
being disjoint or if
one is the subset of
the other
One of these
checks costs us O
(n) we have m
column pairs
O(nm)
Decision Algorithms
Improvement
Visit every column only once to have
Complexity O(nm)
Process first characters for which the
maximum number of objects has state 1
All other characters are either subsets

of it or are disjoint from it.


Algorithms Perfect
Phylogeny Decision
Input: Binary Matrix M
Output: True if M

admits perfect
pylogeny false
otherwise
//Sort column based on
#1's
//Initialize auxiliary matrix
L
for each Lij do
Lij 0
Algorithms Perfect
Phylogeny Decision
for i 1 to n do
k -1
for j 1 to m do
if Mij = 1 then
Lij k
k j
Algorithms Perfect
Phylogeny
Decision
for each column j of
L do
If Lij Lmj for
some i, m and
both Lij and Lmj
are both non
zero then return
false
return true
Algorithms Perfect
Phylogeny
Construction
Input: binary matrix M with

Columns sorted in decreasing order

Output: perfect pylogeny for M


Algorithms Perfect
Phylogeny
Construction
Create root

for each object i do


curNode root
For 1 to m do


If Mij = 1 then

If there already exits edge (curNode, u)
labeled j then curNode u

else Create node u, Create edge( curNode,
u) labeled j, curNode u
Place i in curNode

for each node u except root do



Create as many leaves linked to u as there are
objects in u
Compatibility In
Phylogenies
Recall that we violate the evolution
process by not allowing
convergence and reversals
One Approach is to insist on
avoiding reversals and
convergence and trying to exclude
few characters that causes them.
Compatibility In
Phylogenies
Goal:
Find a maximum set of characters such that
we can find a perfect phylogeny
Problem: Compatibility
Instance: A character state Matrix M with n
objects and m directed binary characters,
and a positive integer B m
Question: Is there a subset L of characters
that satisfies for each pair of characters i
and j that the sets Oi and Oj are disjoint or
one of them contains each other and |L|
B?
Compatibility In
Phylogenies
Problem: Clique
Instance: Graph G = (V,E), and

positive integer K |V|


Question: Does G contain a

subset V' of V with |V'|K such


that every pair of vertices in V' is
linked by an edge in E?
Clique is NP Complete
Ex: Clique
C1

C2 C4

C3

Which nodes build a clique with k


= 3?
Compatibility is NP
Complete
Proof: Create an Instance for Compatibility
from the Instance of Clique as follows:
Given G =(V,E), let m = |V|, so we create for
every vertex vi in V we create character i in M
The number of objects of M is n=3m(m-1)/2
For every pair (vi, vj) such that it is not an edge
in E we create three objects r,s,t in M such that
Mri=0, Msi=1, Mti=1, Mrj=1, Msj=1, Mtj=0
The remaining elements of M should be zero
Example
C1 C3

C2 C4
Compatibility is NP
Complete cont.
G contains a clique V', with |V'|K iff M
contains a compatible character subset L
with |L|K
If such a clique exists, then to every edge of this
clique there corresponds a pair of characters in
M, such that whenever one of them has state 1
for an object, the other has state 0 or both have
0.
If L exists, then to every pair of characters of L
there corresponds a pair of vertices in V linked by
an edge. All this pairs together form a clique K

You might also like