Bioinformatics Genome Organisation
Bioinformatics Genome Organisation
Bioinformatics Genome Organisation
The Nucleoid
The nucleoid (meaning nucleus-like) is an irregularly-shaped region within the cell of a
prokaryote that contains all or most of the genetic material.
The length of a genome varies widely, but is generally at least a few million base pairs.
The nucleoid can be clearly visualized on an electron micrograph at high
magnification, where it is clearly visible against the cytosol. Sometimes even
strands of what is thought to be DNA are visible.
The nucleoid can also be seen under a light microscope.by staining it with
the Feulgen stain, which specifically stains DNA.
The DNA-intercalating stains DAPI and ethidium bromide are widely used
for fluorescence microscopy of nucleoids.
Experimental evidence suggests that the nucleoid is largely composed of
about 60% DNA, plus a small amount of RNA and protein. The latter two
constituents are likely to be mainly messenger RNA and the transcription
factor proteins found regulating the bacterial genome.
Proteins helping to maintain the supercoiled structure of the nucleic acid are
known as nucleoid proteins or nucleoid-associated proteins, and are distinct
from histones of eukaryotic nuclei.
Instead, these proteins often use other mechanisms, such as DNA looping, to
promote compaction.
The Genophore
A genophore is the DNA of a prokaryote. It is commonly referred to as a
prokaryotic chromosome.
The ends of the loops or domains are bound in some way which does not
allow rotational events to propagate from one domain to another
If an endonuclease puts a nick in DNA strand of one domain, this loop becomes larger due to the
uncoiling, but the other domains are not affected. Each domain contains about 40 kbp (13 µm)
of DNA.
The loops are bound by some mechanism that may involve proteins and/or RNA but the
mechanism is not clearly understood.
In E. coli, a number of proteins have been isolated which have some similarities with the
eukaryotic chromosomal proteins.
These proteins are HU, IHF (integration host factor). HI (H-NS) and R It is suspected that HU is
involved in the nucleoid condensation.
The protein HI probably has effects on gene expression. The amino acid sequence of P has some
similarity with the protamine’s (DNA of certain sperms is bound with protamine’s).
Each genome contains all of the information needed to build and maintain that
organism.
❑ It consists of ORF (Open reading frame). These are the reading frame that
has the potential to code for the proteins/peptide. It is stretch of codons that
do not contain a stop codon (UAA, UAG, and UGA). An AUG with the ORF may
indicate where translation starts
GENE REGULATING SEQUENCES
The Core promoter directs the basal transcription complex to initiate transcription of
the gene. In the absence of additional regulatory elements it permits constitutive
expression of the gene, but at very low (basal) levels.
Core promoter elements are typically located very close to the transcription initiation site,
at about nucleotide position -45 to +40.
They include: the TATA box located at position ca. -25, surrounded by GC-rich
sequences and recognized by the TATA- binding protein subunit of TFIID; the
BRE sequence located immediately upstream of the TATA elements at
around -35 and recognized by the TFIIB component;
the lnr (initiator) sequence located at the start site of transcription and
bound by TFIID; the DPE or Downstream Promoter Element, located at
about position +30 relative to transcription and recognized by TFIID.
The proximal promoter region is the sequence located immediately upstream of the core
promoter, usually from -50 to -200 bp (promoter elements found further upstream would be
said to map to the distal promoter region). They include: GC boxes (also called Sp1 boxes, the
consensus sequence is GGGCGG which is often found in multiple copies within 100 bp of the
transcription initiation site); CCAAT boxes typically located at position -75.
ENHANCERS are positive transcriptional control elements which are particularly prevalent in
the cells of complex eukaryotes such as mammals but which are absent or very poorly
represented in simple eukaryotes such as yeast.
They serve to increase the basal level of transcription which is initiated through the core
promoter elements.
Their function, unlike those of the core promoter, are independent of both their orientation
and, to some extent, their distance from the genes they regulate.
Silencer elements have been reported in various position: close to the promoter, some
distance upstream and also within introns.
BOUNDARY ELEMENTS (INSULATORS) are regions of DNA, often spanning from 0.5 kb
to 3 kb, which function to block the spreading of the influence of agents that have a
positive effect on transcription (enhancers) or negative one (silencers,
hetero-chromatin-like repressive effects).
RESPONSE ELEMENTS modulate transcription in response to specific
external stimuli. They are usually located a short distance upstream of the
promoter elements (often within 1kb of the transcription start site).
Group III introns are proposed to be a fifth family, but little is known about the biochemical
apparatus that mediates their splicing. They appear to be related to group II introns, and possibly to
spliceosomal intron.
In molecular genetics, an untranslated region (or UTR) refers to either of two sections, one on each
side of a coding sequence on a strand of mRNA. If it is found on the 5' side, it is called the 5' UTR (or
leader sequence), or if it is found on the 3' side, it is called the 3' UTR (or trailer sequence)
GENE FRAGMENTS
Gene fragments are pieces of genes containing only the exons (those parts of the gene
which actually encode the protein sequence). They are composed of cDNA.
PSEUDOGENES:
Pseudogenes are dysfunctional relatives of genes that have lost their gene expression
in the cell or their ability to code protein. Pseudogenes often result from the
accumulation of multiple mutations within a gene whose product is not required for
the survival of the organism. Depending on their DNA sequence characteristics
pseudogenes are mainly of two types:
Processed pseudogene: They have all the normal parts of a protein-coding gene, but
was originally thought to be incapacitated based on presumed DNA code errors.
Unprocessed pseudogene: They lacks the intervening non-protein coding sequences
called introns, which are typically spliced out when a messenger RNA (mRNA
transcript) is produced from a gene
LONG NON CODING RNA (LNC RNA)
Long non-coding RNAs (lncRNA) are a type of non-coding RNAs (ncRNAs)
that exceed 200 nucleotides in length. lncRNAs are a relatively abundant
component of the mammalian transcriptome and have been implicated in
several cellular functions, including the regulation of gene transcription
through the recruitment of chromatin-modifying enzymes
INTERGENIC /EXTRAGENIC DNA
An Intergenic region (IGR) is a stretch of DNA sequences located between
genes. Intergenic regions are a subset of Noncoding DNA. Occasionally some
intergenic DNA acts to control genes nearby, but most of it has no currently
known function. It is sometimes referred to as junk DNA.
Intergenic regions are different from intragenic regions (or introns), which
are short, non-coding regions that are found within genes, especially within
the genes of eukaryotic organisms.