Bioinformatics Genome Organisation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Genome organization

Bacterial Chromosomes in the Nucleoid


The nucleoid is an irregularly-shaped region within the cell of a prokaryote that contains all
or most of the genetic material.

The Nucleoid
The nucleoid (meaning nucleus-like) is an irregularly-shaped region within the cell of a
prokaryote that contains all or most of the genetic material.

In contrast to the nucleus of a eukaryotic cell, it is not surrounded by a nuclear membrane.


The genome of prokaryotic organisms generally is a circular, double-stranded piece of DNA, of
which multiple copies may exist at any time.

The length of a genome varies widely, but is generally at least a few million base pairs.
The nucleoid can be clearly visualized on an electron micrograph at high
magnification, where it is clearly visible against the cytosol. Sometimes even
strands of what is thought to be DNA are visible.

The nucleoid can also be seen under a light microscope.by staining it with
the Feulgen stain, which specifically stains DNA.

The DNA-intercalating stains DAPI and ethidium bromide are widely used
for fluorescence microscopy of nucleoids.
Experimental evidence suggests that the nucleoid is largely composed of
about 60% DNA, plus a small amount of RNA and protein. The latter two
constituents are likely to be mainly messenger RNA and the transcription
factor proteins found regulating the bacterial genome.

Proteins helping to maintain the supercoiled structure of the nucleic acid are
known as nucleoid proteins or nucleoid-associated proteins, and are distinct
from histones of eukaryotic nuclei.

In contrast to histones, the DNA-binding proteins of the nucleoid do not form


nucleosomes, in which DNA is wrapped around a protein core.

Instead, these proteins often use other mechanisms, such as DNA looping, to
promote compaction.
The Genophore
A genophore is the DNA of a prokaryote. It is commonly referred to as a
prokaryotic chromosome.

The term “chromosome” is misleading, because the genophore lacks chromatin.


The genophore is compacted through a mechanism known as supercoiling, but a
chromosome is additionally compacted through the use of chromatin.

The genophore is circular in most prokaryotes, and linear in very few.

The circular nature of the genophore allows replication to occur without


telomeres. Genophores are generally of a much smaller size than Eukaryotic
chromosomes. A genophore can be as small as 580,073 base pairs (Mycoplasma
genitalium). Many eukaryotes (such as plants and animals) carry genophores in
organelles such as mitochondria and chloroplasts. These organelles are very
similar to true prokaryotes.
The circular nature of the genophore allows replication to occur without
telomeres.

Genophores are generally of a much smaller size than Eukaryotic chromosomes.

A genophore can be as small as 580,073 base pairs (Mycoplasma genitalium).

Many eukaryotes (such as plants and animals) carry genophores in organelles


such as mitochondria and chloroplasts.

These organelles are very similar to true prokaryotes.


Bacterial Chromosomes:
Bacterial chromosome is a
double-stranded circular
DNA. In general, bacterial
DNA ranges from 1100 pm
to 1400 µm in length.

An E. coli cell contains 4. 2 x


106kbp DNA which is about
1.3 mm (1300 µm) in length.
Such a long DNA molecule must be greatly folded to be packaged in a
small space of 1.7 x 0.65 µm.

The bacterial chromosome is folded into loops or domains which are


about 100 in number.

A chromosomal domain may be defined as a discrete structural entity


within which supercoiling is independent of the other domains.

Thus different domains can maintain different degrees of supercoiling.


The DNA chain is coiled on itself to produce supercoiling

The ends of the loops or domains are bound in some way which does not
allow rotational events to propagate from one domain to another
If an endonuclease puts a nick in DNA strand of one domain, this loop becomes larger due to the
uncoiling, but the other domains are not affected. Each domain contains about 40 kbp (13 µm)
of DNA.

The loops are bound by some mechanism that may involve proteins and/or RNA but the
mechanism is not clearly understood.

In E. coli, a number of proteins have been isolated which have some similarities with the
eukaryotic chromosomal proteins.

These proteins are HU, IHF (integration host factor). HI (H-NS) and R It is suspected that HU is
involved in the nucleoid condensation.

The protein HI probably has effects on gene expression. The amino acid sequence of P has some
similarity with the protamine’s (DNA of certain sperms is bound with protamine’s).

However, the functions of the P protein are not known


ORGANIZATION OF EUKARYOTIC GENOME

A genome is an organism’s complete set of DNA, comprising of nuclear and


mitochondrial DNA.

Each genome contains all of the information needed to build and maintain that
organism.

A human haploid cell, consist of 23 nuclear chromosome and one mitochondrial


chromosome, contains more than 3.2 billion DNA base pairs.

Eukaryotic genome is linear and conforms the Watson-Crick Double Helix


structural model.
Embedded in Nucleosome-complex DNA & Protein (Histone) structure that
pack together to form chromosomes.

Eukaryotic genome have unique features of Exon -Intron organization of


protein coding genes, representing coding sequence and intervening sequence
that represents the functionality of RNA part inside the genome
Configurationof Eukaryotic genome:

The configuration of eukaryotic genome includes protein coding region, gene


regulating region, gene related sequence and intergenic DNA or extra genic DNA
which includes low copy number and moderate or high copy numberrepetitive
sequence, the flow chart representation of configuration is given.

GENES AND GENE RELATED SEQUENCES

A. PROTEIN CODING REGION (EXONS)


• Protein coding sequences are the DNA sequences that are transcribed into mRNA later translated
to proteins.
The complete protein coding genes capacity of the genome is contained within the exomes (the part
of the genome formed by exons, the sequences which when transcribed remain within the mature
RNA sequence after introns are removed using RNA splicing)and consists of DNA sequences
encoded by exons that can be later translated into proteins.
• It consists of ORF (Open reading frame). These are the reading frame that has the potential to
code for the proteins/peptide. It is stretch of codons that do not contain a stop codon (UAA, UAG,
and UGA). An AUG with the ORF may indicate where translation starts
❑ The complete protein coding genes capacity of the genome is contained
within the exomes (the part of the genome formed by exons, the sequences
which when transcribed remain within the mature RNA sequence after
introns are removed using RNA splicing)and consists of DNA sequences
encoded by exons that can be later translated into proteins.

❑ It consists of ORF (Open reading frame). These are the reading frame that
has the potential to code for the proteins/peptide. It is stretch of codons that
do not contain a stop codon (UAA, UAG, and UGA). An AUG with the ORF may
indicate where translation starts
GENE REGULATING SEQUENCES

Promoters are combinations of short sequence elements (usually located in the


immediate upstream region of the gene often within 200 bp of the transcription start site)
which serve to initiate transcription.

They can be subdivided into different components.

The Core promoter directs the basal transcription complex to initiate transcription of
the gene. In the absence of additional regulatory elements it permits constitutive
expression of the gene, but at very low (basal) levels.

Core promoter elements are typically located very close to the transcription initiation site,
at about nucleotide position -45 to +40.
They include: the TATA box located at position ca. -25, surrounded by GC-rich
sequences and recognized by the TATA- binding protein subunit of TFIID; the
BRE sequence located immediately upstream of the TATA elements at
around -35 and recognized by the TFIIB component;

the lnr (initiator) sequence located at the start site of transcription and
bound by TFIID; the DPE or Downstream Promoter Element, located at
about position +30 relative to transcription and recognized by TFIID.
The proximal promoter region is the sequence located immediately upstream of the core
promoter, usually from -50 to -200 bp (promoter elements found further upstream would be
said to map to the distal promoter region). They include: GC boxes (also called Sp1 boxes, the
consensus sequence is GGGCGG which is often found in multiple copies within 100 bp of the
transcription initiation site); CCAAT boxes typically located at position -75.

ENHANCERS are positive transcriptional control elements which are particularly prevalent in
the cells of complex eukaryotes such as mammals but which are absent or very poorly
represented in simple eukaryotes such as yeast.

They serve to increase the basal level of transcription which is initiated through the core
promoter elements.

Their function, unlike those of the core promoter, are independent of both their orientation
and, to some extent, their distance from the genes they regulate.

Enhancers often contain within a span of only 200-300 bp.


SILENCERS serve to reduce transcription levels. Two classes have been distinguished:
classical silencers (also called silencer elements) are position independent elements that
direct an active transcriptional repression mechanism; negative regulatory elements are
position-dependent elements that result in a passive repression mechanism.

Silencer elements have been reported in various position: close to the promoter, some
distance upstream and also within introns.

BOUNDARY ELEMENTS (INSULATORS) are regions of DNA, often spanning from 0.5 kb
to 3 kb, which function to block the spreading of the influence of agents that have a
positive effect on transcription (enhancers) or negative one (silencers,
hetero-chromatin-like repressive effects).
RESPONSE ELEMENTS modulate transcription in response to specific
external stimuli. They are usually located a short distance upstream of the
promoter elements (often within 1kb of the transcription start site).

A variety of such elements respond to specific hormones or to intracellular


second messengers such as cyclic AMP.
INTRONS, UTRS
The term intron refers to both the DNA sequence within a gene and the corresponding sequence in
RNA transcripts. At least four distinct classes of introns have been identified.
Introns in nuclear protein-coding genes that are removed by spliceosomes (spliceosomal introns)
Introns in nuclear and archaeal transfer RNA genes that are removed by proteins (tRNA introns)
Self-splicing group I introns that are removed by RNA catalysis.
Self-splicing group II introns that are removed by RNA catalysis

Group III introns are proposed to be a fifth family, but little is known about the biochemical
apparatus that mediates their splicing. They appear to be related to group II introns, and possibly to
spliceosomal intron.

In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each
side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or
leader sequence), or if it is found on the 3' side, it is called the 3' UTR (or trailer sequence)
GENE FRAGMENTS
Gene fragments are pieces of genes containing only the exons (those parts of the gene
which actually encode the protein sequence). They are composed of cDNA.

PSEUDOGENES:
Pseudogenes are dysfunctional relatives of genes that have lost their gene expression
in the cell or their ability to code protein. Pseudogenes often result from the
accumulation of multiple mutations within a gene whose product is not required for
the survival of the organism. Depending on their DNA sequence characteristics
pseudogenes are mainly of two types:
Processed pseudogene: They have all the normal parts of a protein-coding gene, but
was originally thought to be incapacitated based on presumed DNA code errors.
Unprocessed pseudogene: They lacks the intervening non-protein coding sequences
called introns, which are typically spliced out when a messenger RNA (mRNA
transcript) is produced from a gene
LONG NON CODING RNA (LNC RNA)
Long non-coding RNAs (lncRNA) are a type of non-coding RNAs (ncRNAs)
that exceed 200 nucleotides in length. lncRNAs are a relatively abundant
component of the mammalian transcriptome and have been implicated in
several cellular functions, including the regulation of gene transcription
through the recruitment of chromatin-modifying enzymes
INTERGENIC /EXTRAGENIC DNA
An Intergenic region (IGR) is a stretch of DNA sequences located between
genes. Intergenic regions are a subset of Noncoding DNA. Occasionally some
intergenic DNA acts to control genes nearby, but most of it has no currently
known function. It is sometimes referred to as junk DNA.

In humans, intergenic regions comprise about 75% of the genome, whereas


this number is much less in bacteria (15%) and yeast (30%)

Intergenic regions are different from intragenic regions (or introns), which
are short, non-coding regions that are found within genes, especially within
the genes of eukaryotic organisms.

Do contain functionally important elements such as promoters and


enhancers. Also intergenic regions may contain as yet unidentified genes
such as noncoding RNAs. They are thought to have regulatory functions.
The simplest level of packing of DNA which is found winding around Histones is known as
Nucleosome.
•There are two molecules of each of four types of Histones namely H2A, H2B, H3& H4. this give
rise to a complex of 8 proteins named as “Histone octomer”
•This Histone octomeris flattened cylindrical particle of about 11 nm and thickness of nucleosome
is 5.7nm.
•The H1 protein is present only in single set. These nucleosomes are attached to each other by
means of a thin naked DNA which is known as Linker DNA (the H1is associated with Linker DNA)
•Higher order organization is seen further where this nucleofilament has the appearance of beats
on a string at 11 nm
•The complexity in the organization may be represented further by the packing of chromosomes in
a highly compact fashion giving rise to 30 nm, then 300 nm, 700 nm and finally 1400 nmsin
thickness which are able to see as rod like chromosome at metaphase of cell division.
the simplest level is chromatin: a double stranded structure of DNA
2.This DNA forms a complex with proteins called “ histone proteins”
3.This histone-protein complex is known as nucleosome
4.These histone proteins are H1, H2A, H2B, H3 and H4
5.Each nucleosome consists of eight histone proteins,
(two of each H2A, H2B, H3 and H4 histone proteins)
6. These histone proteins forms a core of nucleosome
7. DNA wrapped around these core by 1.65 times.
(less then 2 rounds)
These nucleosomes forms a bead like structure
•Two nucleosome beads attached with each other through linker region
•a nucleosome with H1 protein is chromatosome
DNA and Chromosomes

You might also like