Comparison and Evaluation of Multiple Sequence Alignment Tools in Bininformatics
Comparison and Evaluation of Multiple Sequence Alignment Tools in Bininformatics
Comparison and Evaluation of Multiple Sequence Alignment Tools in Bininformatics
7, July 2009 51
Dept of Computer Science and Information Technology, University Putra Malaysia, 43400 UPM-Serdang, Malay
common methods; Sum-of-Pair Score (SCS) and BALiBASE version 3 (Thompson et al., 2005) was used
Column Score (CS). as the globally accepted benchmark. Multiple sequence
This new perspective offers both advantages and alignment tools were run through the web interface
disadvantages in regard to the choice of particular MSA separately with the protein groups of BALiBASE
tools by users according to specific biological problems. reference datasets.
Defaults parameters were used according to the defined
setting. The quality of alignments was initially acquired
2. Methodology through a score system implemented in BALiBASE.
2.1 Functionality and Features Results were then plotted graphically to make visual
Main features and specifications were selected in view comparison possible. Needles to say that the most accurate
of functionality, as were listed in Fig 1. These features measure was the closest to BALiBASE.
affect the usability and therefore the popularity of the
program. Comparison and contrast yielded detailed criteria Fig.2 illustrates Scorecons results for a certain series of
as can be seen in the summary in Table І. alignments, compared against BALiBASE scores.
The minimum distance with BALiBASE conservation
score was computed, followed by the credit given to the
2.2 Algorithms and Accuracy specified tool.
The latter process is comparing the “heart” of the
programs, i.e., the algorithms that define the quality and
biological meaning of their results.
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.7, July 2009 53
Fig. 3 shows few excel files which includes conservation remarkable length of 50000 characters, T_Coffee is
score results and the calculation of minimum distance for limited to a mere number of 2000 sequences and thus
each. Fig. 4 shows the algorithm used to find the overall unsuitable for such calculation.
distance between each of these tools with BALiBASE Portability among different operating systems is of
benchmark. paramount significance as users may intend to run the
program on their PC rather than web interfaces. T_Coffee
For I:=1 to number of column BALIBASE seems deficient since it can not be run in windows and
{ requires Cygwin to provide a Linux-like environment.
Distance1:=Conservation score (Clustal)
Distancne2:=Conservation score (MUSCLE)
Distrance3:=Conservation score (T-Coffee) 4. Experimental Results
Minimum score: =min (distance1, distance2, distance3)
If minimum score: =distance1
{ Clustal.count:=Clustal count+1 Fig. 5, Fig. 6 and Fig .7 summarize the results of Friedman
If minimum score: =distance2 test pertaining to the data obtained from Scorecons Score,
MUSCLE.count:=MUSCLE.count+1 Sum-of-Pair Score, and Column Score for each of the
Else reference datasets in BALiBASE 3.0 respectively.
T-Coffee.count :=T-Coffee.count+1
Noticeably, there is a statistically significant difference in
}}
}
the comparison.
Minimum of distance: =min (T-Coffee.count,
MUSCLE.count, Clustal.count) This lends to the need for improvement as there is a
considerable gap between MSA tools findings and the
already established BALiBASE benchmark values.
Fig. 4 The proposed algorithm for computing the SCS.
IJCSNS International Journal of Computer Science and Network Security, VOL.9 No.7, July 2009 55
http://www.snowedin.net/windowinthebox/Paper.p
df.
[5] Notredame, C., Higgins, D.G. Heringa, J. (2000). T-
Coffee: A novel method for fast and accurate
multiple sequence alignment. J Mol Biol 302(1),
205-17.
[6] Thompson, J.D., Higgins, D.G. and Gibson, T.J.
(1994) CLUSTAL W: improving the sensitivity of
progressive multiple sequence alignment through
sequence weighting, position specific gap penalties
and weight matrix choice. Nucleic Acids Research,
22(22), 4673-4680.
[7] Edgar, R.C. (2004).MUSCLE: multiple sequence
alignment with high accuracy and high throughput.
Nucleic Acids Research 32(5), 1792-97.
[8] Valdar, W.S.J. (2002). Scoring residue conservation.
Proteins, 48(2), 227-241.
[9] Thompson, J.D., Koehl, P., Ripp, R. & Poch, O.
(2005). BALiBASE 3.0: latest developments of the
multiple sequence alignment benchmark. Proteins,
61, 127-136.
[10] Thompson, J.D., Plewniak F. & Poch O. (1999a). A
comprehensive comparison of multiple sequence
alignment programs. Nucleic Acids Res 27, 2682-
90. doi:10.1093/nar/27.13.2682. PMID 10373585.
[11] Thompson, J.D., Plewniak, F. & Poch, O. (1999b).
BALiBASE: a benchmark alignment database for
the evaluation of multiple alignment programs:
Nucleic Acids Res 29, 3110-20