0% found this document useful (0 votes)
63 views

Protein Database

bioinformatics

Uploaded by

Sneha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Protein Database

bioinformatics

Uploaded by

Sneha
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Protein Databases- Types and Importance

 As biology has increasingly turned into a data-rich science, the need for
storing and communicating large datasets has grown tremendously.
 The obvious examples are the nucleotide sequences, the protein sequences,
and the 3D structural data produced by X-ray crystallography and
macromolecular NMR.
 The biological information of proteins is available as sequences and
structures. Sequences are represented in a single dimension whereas the
structure contains the three-dimensional data of sequences.
 A biological database is a collection of data that is organized so that its
contents can easily be accessed, managed, and updated.
 A protein database is one or more datasets about proteins, which could
include a protein’s amino acid sequence, conformation, structure, and
features such as active sites.
 Protein databases are compiled by the translation of DNA sequences from
different gene databases and include structural information. They are an
important resource because proteins mediate most biological functions.

Importance of Protein Databases


Huge amounts of data for protein structures, functions, and particularly sequences are
being generated. Searching databases are often the first step in the study of a new
protein. It has the following uses:
1. Comparison between proteins or between protein families provides
information about the relationship between proteins within a genome or
across different species and hence offers much more information that can be
obtained by studying only an isolated protein.
2. Secondary databases derived from experimental databases are also widely
available. These databases reorganize and annotate the data or provide
predictions.
3. The use of multiple databases often helps researchers understand the
structure and function of a protein.

Primary databases of Protein


The PRIMARY databases hold the experimentally determined protein sequences
inferred from the conceptual translation of the nucleotide sequences. This, of course, is
not experimentally derived information, but has arisen as a result of interpretation of the
nucleotide sequence information and consequently must be treated as potentially
containing misinterpreted information. There is a number of primary protein sequence
databases and each requires some specific consideration.
a. Protein Information Resource (PIR) – Protein Sequence Database (PIR-PSD):
 The PIR-PSD is a collaborative endeavor between the PIR, the MIPS
(Munich Information Centre for Protein Sequences, Germany) and the JIPID
(Japan International Protein Information Database, Japan).
 The PIR-PSD is now a comprehensive, non-redundant, expertly annotated,
object-relational DBMS.
 A unique characteristic of the PIR-PSD is its classification of protein
sequences based on the superfamily concept.
 The sequence in PIR-PSD is also classified based on homology domain and
sequence motifs.
 Homology domains may correspond to evolutionary building blocks, while
sequence motifs represent functional sites or conserved regions.
 The classification approach allows a more complete understanding of
sequence function-structure relationship.
b. SWISS-PROT
 The other well known and extensively used protein database is SWISS-
PROT. Like the PIR-PSD, this curated proteins sequence database also
provides a high level of annotation.
 The data in each entry can be considered separately as core data and
annotation.
 The core data consists of the sequences entered in common single letter
amino acid code, and the related references and bibliography. The taxonomy
of the organism from which the sequence was obtained also forms part of
this core information.
 The annotation contains information on the function or functions of the
protein, post-translational modification such as phosphorylation, acetylation,
etc., functional and structural domains and sites, such as calcium binding
regions, ATP-binding sites, zinc fingers, etc., known secondary structural
features as for examples alpha helix, beta sheet, etc., the quaternary
structure of the protein, similarities to other protein if any, and diseases that
may arise due to different authors publishing different sequences for the
same protein, or due to mutations in different strains of an described as part
of the annotation.
TrEMBL (for Translated EMBL) is a computer-annotated protein sequence database
that is released as a supplement to SWISS-PROT. It contains the translation of all
coding sequences present in the EMBL Nucleotide database, which have not been fully
annotated. Thus it may contain the sequence of proteins that are never expressed and
never actually identified in the organisms.
c. Protein Databank (PDB):
 PDB is a primary protein structure database. It is a crystallographic database
for the three-dimensional structure of large biological molecules, such as
proteins.
 In spite of the name, PDB archive the three-dimensional structures of not
only proteins but also all biologically important molecules, such as nucleic
acid fragments, RNA molecules, large peptides such as antibiotic gramicidin
and complexes of protein and nucleic acids.
 The database holds data derived from mainly three sources: Structure
determined by X-ray crystallography, NMR experiments, and molecular
modeling.

You might also like