Bioinformatics Paper
Bioinformatics Paper
Bioinformatics Paper
cccccccccccccccccccccccccccc c
c Figure.1
c
Such as maps, weather systems, with crop health and
The term 6
first came into use in the genotype data, will allow us to predict successful
1990s and was originally synonymous with the outcomes of agriculture experiments. -nother future
management and analysis of DN-, RN- and protein area of research in bioinformatics is large-scale
sequence data. Computational tools for sequence comparative genomics. For example, the development
analysis had been available since the 1960s, but this of tools that can do 10-way comparisons of genomes
was a minority interest until advances in sequencing will push forward the discovery rate in this field of
technology led to a rapid expansion in the number of bioinformatics. -long these lines, the modelling and
stored sequences in databases such as GenBank. visualization of full networks of complex systems
Now, the term has expanded to incorporate many other could be used in the future to predict how the system
types of biological data, for example protein structures, (or cell) reacts to a drug for example. - technical set of
gene expression profiles and protein interactions. Each challenges faces bioinformatics and is being addressed
of these areas requires its own set of databases, by faster computers, technological advances in disk
algorithms and statistical methods. storage space, and increased bandwidth. Finally, a
First, many bioinformatics problems require the same Key research question for the future of bioinformatics
task to be repeated millions of times. For example, will be how to computationally compare complex
comparing a new sequence to every other sequence biological observations, such as gene expression
stored in a database or comparing a group of sequences patterns and protein networks. Bioinformatics is about
systematically to determine evolutionary relationships. converting biological observations to a model that a
In such cases, the ability of computers to process computer will understand. This is a very challenging
information and test alternative solutions rapidly is task since biology can be very complex. This problem
indispensable. of how to digitize phenotypic data such as behaviour,
Second, computers are required for their problem- electrocardiograms, and crop health into a computer
solving power. Typical problems that might be readable form offers exciting challenges for future
addressed using bioinformatics could include solving bioinformaticians.2
the folding pathways of protein given its amino acid
sequence, or deducing a biochemical pathway given a -
cc
c
Collection of RN- expression profiles. Computers can c
help with such problems, but it is important to note that The aims of bioinformatics are threefold.
expert input and robust original data are also First, at its simplest bioinformatics organises data in a
Required. way that allows researchers to access existing
information and to submit new entries as they are
produced, eg the Protein Data Bank for 3D
macromolecular structures [6,7]. While data-curation is
an essential task, the information stored in these
databases is essentially useless until analysed. Thus the
purpose of bioinformatics extends much further.
The second aim is to develop tools and resources that
aid in the analysis of data. For example, having
sequenced a particular protein, it is of interest to
compare it with previously characterised sequences.
This needs more than just a simple text-based search
and programs such as F-ST- [8] and PSI-BL-ST [9]
must consider what comprises a biologically significant
match. Development of such resources dictates
expertise in computational theory as well as a thorough
understanding of biology. The third aim is to use these
Figure.2 tools to analyse the data and interpret the results in a
biologically meaningful manner. Traditionally,
The future of bioinformatics is integration. For biological studies examined individual systems in
example, integration of a wide variety of data sources detail, and frequently compared them with a few that
such as clinical and genomic data will allow us to use are related. In bioinformatics, we can now conduct
Disease symptoms to predict genetic mutations and global analyses of all the available data with the aim of
vice versa. The integration of GIS data, uncovering common principles that apply across many
systems and highlight novel features.
c c
a mismatch repaircprotein (mmr) situated on the
shortcarm of chromosome 3 [125]. Throughclinkage
Data sourcec Data sourcec
analysis and its similarity tocmmr genes in mice, the
Raw DN- Separating coding and non-coding
gene hascbeen implicated in nonpolyposis colorectalc
sequencec regions cancer [126]. Given the nucleotidecsequence, the
Identification of introns and exons probable aminocacid sequence of the encoded protein
Gene product prediction can be determined using translation software.
Forensic analysisc Sequence search techniques can then be used to find
Protein Sequence comparison algorithms homologues in model organisms, and based on
sequencec Multiple sequence alignments sequence similarity; it is possible to model the
algorithms structure of the human protein on experimentally
Identification of conserved sequence characterised structures. Finally, docking algorithms
motifsc could design molecules that could bind the model
Macromolecular Secondary, tertiary structure prediction structure, leading the way for biochemical assays to
structurec 3D structural alignment algorithms test their biological activity on the actual protein.
Protein geometry measurements c
Surface and volume shape calculations
Intermolecular interactions
Molecular simulations
(force-field calculations,
molecular movements,
docking predictionsc
Genomesc Characterisation of repeats
Structural assignments to genes
Phylogenetic analysis
Genomic-scale censuses
(characterisation of protein content,
metabolic pathways)
Linkage analysis relating specific
genes to diseasesc
Gene Correlating expression patterns
expression Mapping expression data to sequence,
structural and
biochemical data
Other data Digital libraries for automated
Literature bibliographical searches
Metabolic Knowledge databases of data from
pathways literature
Pathway simulations
cc
Table 1. Sources of data used in bioinformatics, the
quantity of each type of data that is currently (-ugust
2000) available, and bioinformatics subject areas that
utilise this data.
c
c
c
cc
-bove is a schematic outlining how scientists
ccc
cc can use bioinformatics to aid rational drug discovery.
One of the earliest medical applications of MLH1 is a human gene encoding a mismatch
bioinformatics has been in aiding rational drug design. Repair protein () situated on the short arm of
chromosome 3. Through linkage analysis and its similarity
Figure 3coutlines the commonly cited approach,ctaking
to
genes in mice, the gene has been
the MLH1 gene product as ancexample drug target. implicated in nonpolyposis colorectal cancer. Given the
MLH1 is a humancgene encoding a mismatch repairc nucleotide sequence, the probable amino acid sequence of
protein (mmr) situated on the shortcarm of chromosome the encoded protein can be
3 [125]. Throughclinkage analysis and its similarity toc Determined using translation software. Sequence search
mmr genes encodingc techniques can be used to find homologues in model
organisms, and based on sequence
c
c
c
ti cit
ci c ic
ti ctc
c
iilitcitcic ilct c
lctcttc
ctcc
jtict ci c
c
l itctcli
ciciltccitllcttic cttc cllc
ct cticctc
tct ci c
iitc
iti c
t cctcltctc
ctct c
lilcic
ct c ccillccl c icctllcl c
ll c
c
ti lccttccct c i lc
ittict citlc
tciccitc
tctcli
cc c
ct cc ciilct cticlc ccc
'i
c
it¶cBcitcic tti llcic tct c icciciltc ctcic
ctc
cctc
c
c
c it
ci
ic
c ti clc c
ctc l
c
ctclttct c
i
ti c
ctlcttc cc
ci c tc
ci
illcitcttlc
tc
icit c cc't¶ctc ic tilc
lc
c ticcllc
c c
iitct ciictc lti ctc
cictctc
iti cictillc
c
c
Bi l ilctc
ccj c ccic
:ilc"#$)&c;ci ilc cttlc
tcc tti lci l ci lc
l icttitilc
lict
ic l
cc
cicl ,llc ic t lct ctcilc
c icici,t tc
lictc
ctcltcli
ct c l cic ci ct
icct
icc
tc
ct c
i,llc iccccc
tictccilit
cicc
i
7c c
itcc c
ic itc ci c
tc
c c
-ciltc cicl cl
cic ic itlilcllct c
tc
c , cllct c
l c
i
icictilc
ct c
ic
tictctitcttcc,lt
c
c
ici ciclc
c c
c
,lt
cicctilc lti c
cc
ctciilc ticcc
ct c iti lcllcc c lti c-tctc
ic
c llcttci
illci
ct c c l tcllc itctti c
tc
l
citictilcc
c
c
-ctilci c
cctitccttcitc t,
i9c-
ct ctc
cc
l
c
c tlc tict cctiilitc ciitc
tclt
c i9c! cticttc
cic
ci
ic
ic
ct c;itctic
lllcc
clt
c
i
c
c i
ti ccic iti c
cic l
c
t
iti lc lti ct9citilct
ic cttc ci
ciiiictcltc tc
c
tc
c
c
l
c
i
tic c
c
iti citciilc
ilitctc
iitc
ct lictcic i
,
tlcic
c
c
-ccltci i
ticc tc lc i
c #<<<;c$)# 7+,##c
tc
tct ci l ilcititi ctc
c #)cC ticCc( ticct
c
ilic
ctc
tc
ii c
c
tccllccticccc llci l itc"&c=tc#<<$ ;c%.*4%*< 7.)%,)c
-lct cici
ii
lctcic
tilc
c #.c cC-c? c!c t c?Mc( tic
l c ctcitct cttcclt
cic
ilic
c
ic
l
c=tc#<<) ;c
ct c %*$4./* 74%#,)c
8 c ciilcttclc cc #4c2c-McC ticCc c
i
c--cAc?c c
AcAcMillc;ctclc1
cB2-c
c(,
B2-7cccti c
c tic
tcc c
c=lic-i
cRc#<<*;c$.#* 7%%+<,
%)/$c c
#/c ltcCc? c;ic??c2cctc
C?c1cMRc1 lcRc2
c cC cR-c
c
!itictclt ciitc
cc tic
cCllc#<<+;c<.. 7*#*,*$+c
c
##c(
c-1c?
c2?cBcc
t
l
tcc8c!;c-c!=-cttlctlc