Pam Blosum
Pam Blosum
A
D
A
C
C
B
D
B
G
G
I
I
H
H
J
J
Example taken from Borodovsky & Ekisheva (2007) Problems and Solutions in Biological
sequence analysis. Cambridge Univ Press.
ABGH
ABIJ
H-J G-I
J-H I-G
ABIJ
ABGH
ABGH
B-C
A-D
ACGH
B-D
DBGH
A-C
ADIJ
CBIJ
B-C
ABIJ
A-D
ACGH
DBGH
ABIH
I-G
ACGH
ABIJ
DBGH
ADIJ
J-H
H-J
A-D
A-C
CBIJ
ABGJ
ABGH
B-C
B-D
B-D
ADIJ
G-I
ABGH
A-C
CBIJ
B-C
ACGH
ABIJ
A-D
DBGH
B-D
ADIJ
A-C
CBIJ
0
0
ABGH
H-J G-I
ABGH
B-C
ABIJ
A-D
ACGH
DBGH
B-D
ADIJ
A-C
CBIJ
A B G H
A B G H
A B G H
A B I J
A B G H
A C G H
A B G H
D B G H
A B I J
A D I J
A B I J
C B I J
number of changes of j
mj =
number of occurrences of j
Relative amino acid mutability values mj for our example
Amino acid
Changes (substitutions)
Frequency of occurrence
40
40
24
24
24
24
Relative mutability mj
0.2
0.2
0.167
0.167
0.167
0.167
The relative mutability accounts for the fact that the different amino acids have different
mutation rates. This is thus the probability to mutate.
mi
aa
mi
Asn
134
His
66
Ser
120
Arg
65
Asp
106
Lys
56
Glu
102
Pro
56
Ala
100
Gly
49
Thr
97
Tyr
41
Ile
96
Phe
41
Met
94
Leu
40
Gln
93
Cys
20
Val
74
Trp
18
f j = k q(bj )N (b )
b
where
and the coefficient k is chosen the ensure that the sum of the frequences fj = 1.
In our example, there is only one block, therefore the effective frequencies
are equal to the compositional frequencies (fi = qj):
Amino acid
Frequency f
0.125
0.125
0.125
0.125
0.125
0.125
0.125
0.125
Gly
Ala
Leu
Lys
Ser
Val
Thr
Frequency f
0.089
0.087
0.085
0.081
0.070
0.065
0.058
Amino acid
Pro
Glu
Asp
Arg
Asn
Phe
Gln
Frequency f
0.051
0.050
0.047
0.041
0.040
0.040
0.038
Amino acid
Ile
His
Cys
Tyr
Met
Trp
Frequency f
0.037
0.034
0.033
0.030
0.015
0.010
M ij =
m j Aij
Diagonal elements of M:
M ii = 1 m i
Akj
k
In these equations, m is the relative mutability and A is the matrix of accepted point mutations.
The constant represents a degree of freedom that could
be used to connect the matrix M
with an evolutionary
time scale.
In our example:
A
A
0
4
4
see matrix A
this represents
32/40 of the cases
B
C
D
this represents
8/40 of the cases
mutability m
M ij =
Diagonal elements of M:
m j Aij
M ii = 1 m i
Akj
k
In these equations, m is the relative mutability and A is the matrix of accepted point mutations.
The constant represents a degree of freedom that could
be used to connect the matrix M
with an evolutionary
time scale.
The coefficient could be adjusted to ensure that a specific (small) number of substitutions would
occur on average per hundred residues. This adjustement was done by Dayhoff et al in the following
way. The expected number of amino acids that will remain inchanged in a protein sequence 100
amino acid long is given by:
100 f j M jj = 100 f j (1 m j )
j
If only one substitution per residue is allowed, then is calculated from the equation:
100 f j (1 m j ) = 99
j
0.9948
0.0131
0.0131
0.9948
0.0131
0.0131
0.0026
0.0026
0.9740
0.0026
0.0026
0.9740
0.9957
0.0043
0.9957
0.0043
0.0043
0.9957
0.0043
0.9957
9867
10
17
21
22
35
32
18
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
9913
10
10
19
9822
36
21
13
20
42
9859
53
9973
9876
27
23
10
56
35
9865
21
12
11
9935
21
18
20
9912
9872
12
33
22
9947
45
13
15
37
25
12
9926
20
11
9874
9946
28
13
9926
12
28
11
34
11
16
17
9840
38
22
13
11
32
9871
9976
21
9945
13
57
11
17
10
9901
line 3 of PAM1
column 17 of PAM1
Source: J. v an Helden
Source: J. v an Helden
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
$ fA
&
fR
n
&
fA
fR
...
...
...
fV
...
fA '
)
fR )
... )
)
fV (
13
3
4
5
2
3
5
12
2
3
6
6
1
2
7
9
8
0
1
7
6
17
4
4
1
5
4
5
5
2
4
18
1
1
5
6
5
2
1
4
9
4
6
8
1
5
7
10
5
2
4
10
1
2
5
8
6
0
2
4
9
3
7
11
1
6
11
10
4
2
3
8
1
1
4
7
6
0
1
4
5
2
2
1
52
1
1
4
2
2
2
2
0
1
3
7
4
0
3
4
8
5
5
7
1
10
9
7
7
2
6
10
1
1
5
6
5
0
1
4
9
3
6
10
1
7
12
9
4
2
4
8
1
1
4
7
5
0
1
4
12
2
4
5
2
3
5
27
2
2
3
5
1
1
5
9
6
0
1
4
6
6
6
6
2
7
6
5
15
2
5
8
1
3
5
6
4
1
3
5
8
3
3
3
2
2
3
5
2
10
15
5
2
5
3
5
6
0
2
4
6
2
2
2
1
3
2
4
2
6
34
4
3
6
3
4
4
1
2
15
7
9
5
5
1
5
5
6
3
2
4
24
2
1
4
7
6
0
1
10
7
4
3
3
1
3
3
5
2
6
20
9
6
4
3
5
5
0
2
4
4
1
2
1
1
1
1
3
2
5
13
2
2
32
2
3
3
1
15
10
11
4
4
4
2
4
4
8
3
2
5
6
1
1
20
9
6
0
1
5
11
4
5
5
3
3
5
11
3
3
4
8
1
2
6
10
8
1
2
5
11
3
4
5
2
3
5
9
2
4
6
8
1
2
5
9
11
0
2
5
2
7
2
1
1
1
1
2
2
1
6
4
1
4
1
4
2
55
3
72
4
2
3
2
4
2
2
3
3
3
7
3
1
20
2
4
3
1
31
4
9
2
3
3
2
3
3
7
2
9
13
5
2
3
4
6
6
0
2
17
* * * * A * * * * *
250 PAM
*
*
*
*
...
*
*
*
*
*
*
*
*
*
*
*
*
A
R
N
W
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
probability o f 13%
probability o f 3%
probability o f 4%
probability o f 0%
M nji P ji,n
rn (i, j) =
=
fj
fi f j
Pji,n = fi Mjin is the probability that two aligned amino acids have diverged
from a common ancestor n/2 PAM unit ago, assuming that the substitutions
follow a Markov process (for details, see Borodovsky & Ekisheva, 2007).
Note that R (the odd-score or relatedness matrix) is a symmetric matrix.
sn (i, j) = log
M nji
fj
= log
P ji,n
fi f j
A
S
H
D
G
G
K
D
-2
-2
-1
-2
-1
-1
-3
-6
-4
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
-2
0
0
-2
0
0
1
-1
-1
-2
-1
-1
-3
1
2
0
-6
-3
0
6
0
-1
-3
1
-1
-3
1
-2
-3
3
-1
-4
0
1
-2
2
-5
-2
0
2
2
-4
1
1
0
1
-2
-3
1
-2
-3
0
2
0
-5
-2
-2
-1
2
4
-5
2
3
1
1
-2
-4
0
-3
-5
-1
1
-1
-7
-4
-2
-4
-3
-5
12
-5
-5
-3
-3
-2
-6
-5
-5
-4
-3
1
-3
-7
1
-2
1
1
2
-5
4
2
-1
3
-2
-2
1
-1
-4
0
0
-2
-6
-4
-2
-1
2
3
-5
2
4
0
0
-2
-3
0
-2
-5
0
1
-1
-7
-4
-2
-3
1
1
-4
-1
0
5
-3
-3
-4
-2
-3
-4
0
2
-1
-7
-5
-2
2
2
1
-3
3
1
-2
6
-3
-2
0
-2
-2
0
0
-2
-5
0
-2
-2
-2
-2
-3
-2
-2
-2
-3
4
2
-2
2
1
-2
-1
-1
-6
-1
4
-3
-3
-4
-6
-2
-3
-4
-3
2
6
-3
4
2
-2
-2
-2
-7
-1
2
3
1
0
-5
1
0
-2
0
-2
-2
5
1
-5
-1
1
-1
-4
-5
-2
0
-2
-3
-5
-1
-2
-3
-3
2
4
0
6
0
-2
-1
-1
-6
-2
2
-4
-3
-5
-4
-4
-5
-5
-2
1
2
-5
0
9
-4
-2
-4
1
7
-1
0
0
-1
-2
0
0
0
0
-2
-2
-1
-2
-5
6
2
0
-6
-5
-1
0
1
0
0
-1
0
1
-1
-1
-3
0
-2
-3
1
2
1
-2
-3
-1
-1
1
0
-2
-1
0
0
-1
0
-2
0
0
-3
0
2
2
-5
-3
0
2
-4
-7
-8
-5
-7
-7
-3
-5
-2
-4
-4
0
-6
-2
-6
17
0
-6
-5
-2
-4
0
-4
-4
-5
0
-1
-1
-5
-3
7
-5
-2
-4
1
10
-3
-2
-2
-2
-2
-2
-2
-1
-3
4
2
-2
2
-1
-1
0
0
-8
-2
4
Source: J. v an Helden
C
S
T
P
A
G
N
D
E
Q
H
R
K
M
I
L
V
F
Y
W
12
PAM250 matrix (log-odds)
0
2
-2
1
3
-1
1
0
6
-2
1
1
1
2
-3
1
0
-1
1
5
-4
1
0
-1
0
0
2
-5
0
0
-1
0
1
2
4
-5
0
0
-1
0
0
1
3
4
-5
-1
-1
0
0
-1
1
2
2
4
-3
-1
-1
0
-1
-2
2
1
1
3
6
-4
0
-1
0
-2
-3
0
-1
-1
1
2
6
-5
0
0
-1
-1
-2
1
0
0
1
0
3
5
-5
-2
-1
-2
-1
-3
-2
-3
-2
-1
-2
0
0
6
-2
-1
0
-2
-1
-3
-2
-2
-2
-2
-2
-2
-2
2
5
-6
-3
-2
-3
-2
-4
-3
-4
-3
-2
-2
-3
-3
4
2
6
-2
-1
0
-1
0
-1
-2
-2
-2
-2
-2
-2
-2
2
4
2
4
-4
-3
-3
-5
-4
-5
-4
-6
-5
-5
-2
-4
-5
0
1
2
-1
9
0
-3
-3
-5
-3
-5
-2
-4
-4
-4
0
-4
-4
-2
-1
-1
-2
7 10
-8
-2
-5
-6
-6
-7
-4
-7
-7
-5
-3
2
-3
-4
-5
-2
-6
0
0
C
S
T
P
A
G
N
D
E
Q
H
R
K
M
I
L
V
F
Y
Cys Ser Thr Pro Ala Gly Asn Asp Glu Gln His Arg Lys Met Ile Leu Val Phe Tyr
Hydrophobic
C
P
A
G
M
I
L
V
Aromatic
H
F
Y
Polar
S
T
N
Q
Y
Basic
H
R
K
Acidic
D
E
Source: J. v an Helden
17
W
Trp
W
T
Y
A
S
H
D
G
G
K
D
twilight zone
(detection limit)
identity
(%)
difference
d (%)
PAM index
n
99
95
90
10
11
85
15
17
80
20
23
75
25
30
70
30
38
60
40
56
50
50
80
40
60
112
30
70
159
20
80
246
14
86
350
Remark:
In the PAM matrices, the index indicates the percentage of
substitution per position.
Higher indexes are more appropriate for more distant proteins
(PAM250 better than PAM100 for distant proteins).
BLOSUM62
62%
MOTIF
80%
clusters of sequences
sharing a high
identity
...
504 groups of related
proteins
BLOSUM80
D C A A
D C A A
A A C C
D A C C
1/2
A D A D
A D C D
A C C D
weight:
1/3
D C A A
D C A A
weight:
1/2
A A C C
D A C C
weight:
1/2
5/6
13/3
11/3
13/3
5/3
11/3
5/3
1/2
2nd
column
3rd
column
4th
column
qij =
f
i
ij
j=1
fij
qij
5/6
13/3
11/3
5/72
13/36
11/36
5/3
1/12
5/36
1/2
C
D
i
f
i
j=1
ij
= fAA+fAC+fAD+fCC+fCD+fDD = 12
Examples:
1/24
qAA = fAA/12 = 5/72
qCA = fCA/12 = fAC/12 = 5/72
eii = pi2
1
pi = qii + qij
2 ji
eij
0.1622
0.2683
0.2125
0.1108
0.1757
C
D
0.0696
qij
sij = 2log 2
eij
Expected frequencies of
occurrence of a pair (i,j)
s ij
-2
-1
-1
-1
-1
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
S
H
V
V
Y
Y
Val
Tyr
Trp
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
Source: J. v an Helden
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
S
H
V
V
Y
Y
Val
Tyr
Trp
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
Val
S
H
V
V
Y
Y
Tyr
Trp
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
Val
S
H
V
V
Y
Y
Tyr
Trp
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
Val
S
H
V
V
Y
Y
Tyr
Trp
Substitutions
between polar
residues
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
5
0
-2
-3
1
0
-2
0
-3
-2
2
-1
-3
-2
-1
-1
-3
-2
-3
-1
0
-1
-4
6
1
-3
0
0
0
1
-3
-3
0
-2
-3
-2
1
0
-4
-2
-3
3
0
-1
-4
6
-3
0
2
-1
-1
-3
-4
-1
-3
-3
-1
0
-1
-4
-3
-3
4
1
-1
-4
9
-3
-4
-3
-3
-1
-1
-3
-1
-2
-3
-1
-1
-2
-2
-1
-3
-3
-2
-4
5
2
-2
0
-3
-2
1
0
-3
-1
0
-1
-2
-1
-2
0
3
-1
-4
5
-2
0
-3
-3
1
-2
-3
-1
0
-1
-3
-2
-2
1
4
-1
-4
6
-2
-4
-4
-2
-3
-3
-2
0
-2
-2
-3
-3
-1
-2
-1
-4
8
-3
-3
-1
-2
-1
-2
-1
-2
-2
2
-3
0
0
-1
-4
4
2
-3
1
0
-3
-2
-1
-3
-1
3
-3
-3
-1
-4
4
-2
2
0
-3
-2
-1
-2
-1
1
-4
-3
-1
-4
5
-1
-3
-1
0
-1
-3
-2
-2
0
1
-1
-4
5
0
-2
-1
-1
-1
-1
1
-3
-1
-1
-4
6
-4
-2
-2
1
3
-1
-3
-3
-1
-4
7
-1
-1
-4
-3
-2
-2
-1
-2
-4
4
1
-3
-2
-2
0
0
0
-4
5
-2
-2
0
-1
-1
0
-4
11
2
-3
-4
-3
-2
-4
7
-1
-3
-2
-1
-4
4
-3 4
-2 1 4
-1 -1 -1 -1
-4 -4 -4 -4
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
A
A
C
C
G
G
I
I
L
L
M
M
P
P
S
H
V
V
Y
Y
Val
Tyr
Trp
Thr
Hydrophobic
Aromatic
Polar
Basic
Acidic
4
-1
-2
-2
0
-1
-1
0
-2
-1
-1
-1
-1
-2
-1
1
0
-3
-2
0
-2
-1
0
-4
Asn
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
Arg
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Ala
BLOSUM62 matrix
RBLOSUM62
BLOSUM62 miscalculations improve search performance
MP Styczynski, KL Jensen, I Rigoutsos, G Stephanopoulos
Nat. Biotech. 26: 274275, 2008
RBLOSUM62
BLOSUM62 miscalculations improve search performance
MP Styczynski, KL Jensen, I Rigoutsos, G Stephanopoulos
Nat. Biotech. 26: 274275, 2008
Supplementary Figure 5. The revised
BLOSUM matrix, RBLOSUM62.
Values in red are one greater than they
would be in BLOSUM62, while values in
green are one less than they would be in
BLOSUM62.
The entropy of this matrix (based on raw
matrix values) is 0.6626 bits.
more conserved
proteins
PAM100 BLOSUM90
PAM120 BLOSUM80
PAM160 BLOSUM60
more distant
proteins
PAM200 BLOSUM52
PAM250 BLOSUM45
GONNET matrix
GONNET scoring matrix
Source: http://www.ebi.ac.uk/help/matrix.html
GONNET matrix
GONNET scoring matrix
Protein
Database
pair-wise
alignments
new matrix
PAM250
iterative
pair-wise
alignments
GONNET m atrix
purines
pyrimidines
C
transition
transversion
PAM10
90.7
-3.70
-2.19
-3.70
-3.70
90.7
-3.70
-2.19
-2.19
-3.70
90.7
-3.70
-3.70
-2.19
-3.70
90.7
0.99
0.00333 0.99
0.99
0.006
0.99
0.002
0.002
0.99
0.002
0.002
0.006
0.99
Source: Mount, 2004
0.99
0.00333 0.99
Exercise
Here are given the probability matrices.
0.99
0.006
0.99
0.002
0.002
0.99
0.002
0.002
0.006
0.99
-6
-6
-6
-6
-6
-6
-5
-7
-7
-7
-7
-5
2
Source: Mount, 2004
-5
-5
-5
-5
-5
-5
-4
-6
-6
-6
-6
-4
2
Source: Mount, 2004
1.86
-3.01
1.86
-3.01
-3.01
1.86
-3.01
-3.01
-3.01
1.86
1.86
-2.18
1.86
-3.70
-3.70
1.86
-3.70
-3.70
-2.18
1.86
Source: Mount, 2004
0.83
-0.46
0.83
-0.46
-0.46
0.83
-0.46
-0.46
-0.46
0.83
0.88
0.069
0.88
-0.86
-0.86
0.88
-0.86
-0.86
0.069
0.88
Source: Mount, 2004
-6
-6
-6
-6
-6
-6
-5
-7
-7
-7
-7
-5
Exercise
difference
%
PAM index
(n)
99
90.6
9.4
10
78.7
21.3
25
63.5
36.5
50
44.8
55.2
100
identity
%
difference
%
99
90.7
9.3
10
79.0
21.0
25
64.2
35.8
50
46.3
53.7
100
PAM index
(n)
CFTR region
(37% G+C)
HOXD region
(47% G+C)
hum16pter region
(53% G+C)
NCBI-BLAST
-4
-4
-4
-4
-4
-4
-2
-2
-2
-2
-2
-2
Source: NCBI
BLOSUM
Henikoff & Henikoff (1992). Amino acid substitution matrices from protein blocks. PNAS 89:10915-
10919.
Reviews
Eddy (2004) Where did the BLOSUM62 alignment score matrix come from? Nature Biotech 22:
1035-1036.