Protein Modelling
Protein Modelling
Protein Modelling
THE PROBLEM
Can
we predict the 3-dimensional shape of a protein given its amino acid sequence alone? NOOOOOO
Generally,
But
PROTEINS
Amino
PROTEINS
What determines protein fold? Rigidity of backbone Interactions among amino acids Amino acid interaction with water
SECONDARY STRUCTURE
Folding
of the linear sequence of proteins into regular repeating patterns helix sheets Coil or loop
SECONDARY STRUCTURES
CATH TAXONOMY
Database containing hierarchical domain classifications of protein structures from PDB C Class, C-level
Defined by secondary structure composition Defined by overall shape of domain structure Defined by overall shape and connectivity of domain structures
A Architecture, A-level
Prediction in 1D
Secondary structure Solvent accessibility Transmembrane helices
Prediction in 2D
Inter-residue/strand contacts
Prediction in 3D
Homology modeling Fold recognition Ab initio prediction
1D SECONDARY STRUCTURE
Given
What
Make prediction for a given residue by considering a window of n neighboring residues Determine model that performs mapping from window of residues to secondary structure state
Homology Modelling
Most accurate when the target and template have similar sequences
template homologous proteins structure was determined using high resolution experimental methods (i.e., X-ray crystallography or NMR)
use sequence alignment search programs (e.g. BLAST) to identify homologous sequence from protein structure databases like PDB Selection of template can be:
Select template with the highest sequence identity Select potentially different template for each similar segment of the protein sequence
Better to use high resolution structures as model template function Ligands environment
accuracy of the alignment --> critical parameter for successful homology modelling
Build a model using the known structures of homologous template protein Common modelling methods use: by assembly of rigid bodies
(e.g. SEGMOD)
(e.g. MODELLER)
model is assembled from a small number of rigid bodies obtained from the aligned protein structures Proteins can be dissected into
conserved core regions variable loops connect conserved core region Sidechains decorate the backbone
based the findings that most hexapeptide segment of protein structure can be clustered into only 100 structurally different classes
Segments on the template usually the conserved segment serve as guiding position
Segments of the target protein fit on these guiding position will be identified and assembled
starts by generating many constraints or restraints on the structure of target sequence restraints are obtained
assuming that the corresponding distances between aligned residues in the template and the target structures are similar Considering stereochemical restraints on bond lengths, bond angles, dihedral angles, and non-bonded atomatom contacts that
The model is then derived by minimizing the violations of all the restraints which is achieved either by distance geometry or realspace optimization MODELLER-software used
Validity of the constructed model must be checked Evaluate the stereochemistry and other structural features of the model (e.g., bond lengths, and dihedral angles, side chain rotamers, etc)
hydrophobic core, solvent accessibility, distribution of charged groups, atom-atomdistances, atomic volumes and main-chain hydrogen bonding
a number of online servers are available to evaluate 3D models including PSVS, Eval123D and JCSG.
mistakes in alignment of the sequence to the template selecting wrong template errors in modelling side chains error in modelling sequence segments without template
Large bias to template Cant study conformational changes, Cant find new catalytic or active side Cant explain the activity or lack of activity of the protein
Protein Threading:
What It Is, When To Do It and How It Is Done
Homology Modeling has its limitations. so Protein Threading makes up for it.
So, when should we do it?
1. We have a sequence of unknown structure. 2. The sequence has no detectable homology to anything of known structure. 3. There are no functional clues as to the structural class of the unknown.
But these situations arent always recognised.
Score Function Measures match of unknown sequence and target sequence. Number of amino acids of type i in the environment m
Unknown Sequence
Target Sequence
But the good thing is, we know the characteristics of the amino acids present.
H bond donor H bond acceptor Glycine Hydrophobic
Candidate # 2.
S=5
Candidate # 3. S = -3
Position on Sequence
We get it when the sequence of amino acids in the unknown highly correspond to that of the target sequence.
The factors that account for the correspondence are as follows: - amino acid preferences for solvent accessibility
Protein threading will be obsolesced without ever really having had a phase of glory. (Torda, 2003)
Less than 30% of the predicted first hits are true remote homologues.
The sequence already has a very high homology with a known structure. The protein has unusual characteristics.
But then again, times have changed Tools now exist to get reliable scores.
Application
Usually, homology modeling is applied in the following fields:
1. 2.
3.
Innovation
MODELLER Open source software Can be used to model proteins and docking Produces outputs which does not include H atoms Flexible
Thank you!