Analysis of Y-Chromosome Polymorphisms in Pakistani Populations
Analysis of Y-Chromosome Polymorphisms in Pakistani Populations
PAKISTANI POPULATIONS
BY
Sadaf Firasat
Karachi, Pakistan
2010
TABLE OF CONTENTS
Title page
Acknowledgements ii
List of Figures iv
Summary vi
Introduction 1
Literature Review 19
Results
Discussion 86
Conclusions 121
References 122
Appendix a
i
ACKNOWLEDGEMENT
I thank Prof. Dr. Syed Qasim Mehdi H.I. S.I., for his support, encouragement
and for providing all the facilities for doing scientific work in his laboratory.
The work presented in this thesis was done under the supervision of Dr.
Qasim Ayub T.I. It is great pleasure for me to acknowledge the keen interest, advice,
patient guidance and kindness that I have received from him during the course of this
work.
I would like to thank Dr. Shagufta Khaliq, (PoP), for teaching all the molecular
genetics lab techniques and also to Dr Aiysha Abid for comments on this manuscript
and suggestion for its improvement.
I am also grateful to Mrs. Ambreen Ayub for her help in making the contour
map.
I thank my colleague Ms. Sadia Ajaz for her help and cooperation in proof
reading the thesis.
It has been an honor for me to work at SIUT and I thank Prof. Dr Adeeb Rizvi
H.I. S.I., Director, SIUT, for his constant support and guidance.
Finally, I would like to thank my parent, without their love and support the
completion of this work would have not been possible.
ii
LIST OF TABLES
VII. Y lineages found in the three Punjabi castes examined in this study. 63
IX. Population pair wise FSTs between Pakistani ethnic groups computed
from Y haplogroup frequencies. FST p values (based upon 110
permutations) are given above the diagonal with *indicating significant
pair wise differences. 69
XIII. Y-STRS data of clade B lineages in Pakistan and African populations. 108
iii
LIST OF FIGURES
XI. A plot of the first two principal coordinates based upon the
analysis of Y haplogroup frequencies in Pakistani and Greek
samples (1=this study; 2 = Francalacci et al., 2003) using
comparable biallelic markers. 78
v
SUMMARY
-1-
The data presented in this thesis provides a comprehensive report on Y
insights into the genetic variation in Pakistan in a global context and also sheds light
summarized as follows:
The results suggest that within Pakistan male genetic relationships are
the isolation of the Hunza Burusho in the mountains of northern Pakistan has led to
the preservation of their language it has not made them genetically distinct in
Pakistan show evidence of admixture mostly with Central/South Asian and European
populations. This is illustrated by the fact that the major haplogroups such as E*, J*
and R*, that are frequent in west Asians and Europeans, together constitute 65% of
the total. Haplogroups L1 and R2 are shared with populations from India and
from China:
East Asians, are rare, or absent in the Pakistani populations and constitute < 1.5 % of
the total. Populations living in these mountain valleys such as the Hunza Burusho,
Balti and Kashmiri are all genetically closer to other ethnic groups in Pakistan. This
vi
low prevalence, or absence, of East Asian haplotypes in Pakistan indicates that the
Karakoram Mountains, which separate Pakistan and China, form a formidable barrier
to gene flow from the north. The Hazara are the only population with significant East
Asian ancestry but historical records indicate that they did not cross this geographical
and Central Asia. These probably replaced the indigenous Y haplogroups which are
now mostly found in South Indians and isolated populations in the Andaman Islands.
Three populations (Burusho, Kalash and Pathan) also claim Greek ancestry
following Alexanders invasion of the subcontinent. However, the results shown here
only provided strong support for a minor Greek genetic contribution to the Pathan
gene pool.
haplogroup C3 Y chromosomes in the Hazara population has been linked to the male
Mongolia and are observed at a frequency of 60% in a much larger sample of Hazara
males from northern and southern Pakistan that were analyzed in this study.
Although this haplogroup was also observed in the Burusho (8.2%) but these
samples did not share the star haplotype pointing towards separate origins for these
populations. Historical records also support the genetic relatedness between East
vii
4. The Kalash as genetic outliers:
This study also demonstrates that the Kalash have a distinct genetic identity
within Pakistan. Located in the remote valleys of the Hindu Kush Mountains they
show significant Caucasian ancestry but also have a high proportion of population
specific haplogroup L3a that is not found elsewhere in Pakistan. Their genetic
Future Prospects:
population diversity and structure in this region and provides a basis for future work
in this field.
viii
INTRODUCTION
-2-
Where do we come from? What are we? Where are we going? These provocative
questions as framed in the title of the French artist Paul Gauguins painting have
always aroused human curiosity. Using evidence from archaeology, fossils and
Human evolutionary history begins with the appearance of our species about
2.5 -1.5 million years ago (MYA), the earliest evidence of which has been found in
Africa (Klien, 1989). With the passage of time, various species of the genus Homo
floresiensis (Brown et al., 2004; Gabunia and Vekua, 1995; Swisher et al., 1994), all
of whom are now extinct with the exception of modern H. Sapiens, the last fully
developed species that appeared about 100,000 years ago in East Africa (Klien,
1989; Righmire, 1989). The demise of our early ancestors has been attributed to
harsh weather conditions or the difficulty in finding food and other life necessities.
humans arose in Africa and several waves of migrations help explain their passage
out of Africa. Evidence from fossils and archaeological remains suggest that
favorable. The discovery of 125,000 old artifacts in Eritrea`s Red Sea coast (Walter
et al.,2000) suggest that people from the Horn of Africa moved across the Arabian
peninsula to the southern part of the Red Sea. They reached southern Asia,
traveling further east to Australia (Stringer, 2000) around 50-60 thousand years ago
(KYA). The evidence found from Skhul and Qafzeh, in modern day Israel, dating 100
KYA suggests that another wave of migration humans crossed the Red sea and
entered the Levantine region 47 KYA. From Arabia, people moved towards west and
east and reached Western Europe and Siberia about 40 KYA and East Asia about 39
races of modern humans that are characterized by the differences in their physical
1
Fossil and archaeological evidence in favour of an African origin for modern
Bowcock et al., 1991, 1994; Cann et al., 1987; Cavalli-Sforza et al., 1994; Horai et
al., 1995; Jorde et al., 1995; Knight et al., 1996; Lahr and Foley, 1994; Leakey 1994;
Mountain et al., 1994; Perez-Lezaun et al., 1997; Ruvolo et al., 1993; Scozzari et al.,
1988; Shiver et al., 1997, Stringer and Andrew, 1988; Tattersall, 1997; Tishkoff et al.,
1996). This biological evidence has provided valuable insights and, in association
The blood groups were the first markers to be analyzed in human populations soon
after the discovery of the ABO blood groups (Landsteiner, 1901). Variations in these
blood groups were analyzed among Second World War soldiers and the slaves from
different nations (Hirszfeld and Hirszfeld, 1919). This was followed by the discovery
(Dausset, 1954; Grubb and Laurell, 1956; Payne et al., 1964) and serum proteins
(Harris, 1966). All these markers collectively contributed to our understanding of the
WHAT IS DNA?
In 1953 the celebrated Nobel Prize winners Watson and Crick described the
double helical chemical structure of DNA (Watson and Crick, 1953) and laid the
foundations for the development of DNA based genetic markers that have now
become the hallmark of research into our past history. The simple but elegant
structure of DNA that they described has two anti-parallel polynucleotide chains with
a sugar- phosphate backbone. The nucleotide bases in DNA are of only four kinds:
adenine (A), guanine (G), cytosine (C) and thymine (T) that strictly obey hydrogen
bonding of nucleotides A with T and G with C. The sequences of these bases in the
2
polynucleotide chain dictate the structure and function of proteins and every
In humans DNA is present inside the cellular nucleus and the mitochondria,
an extra nuclear organelle. In the mitochondria the DNA is small, circular and double
stranded with a length of 16,569 base pairs (bp) (Anderson et al., 1981; Ruiz-Pesini
et al., 2007). It consists of only 37 genes but has been extremely useful in tracing
back the maternal origin of the human populations because it has three important
characteristics:
The human nuclear genome consists of a double stranded DNA molecule that
identical in both male and female. One pair, the sex chromosomal pair, is different in
the sexes. Females have two X chromosomes whereas males have one X-
chromosome which they inherit from the mother and one Y chromosome which is
paternally inherited. This Y-chromosome is passed from a father to his son and does
not undergo inter-chromosomal recombination for most of its length. This feature has
Sequencing Consortium, 2004) has revealed that enormous variation exists in our
genome. Only 2-3% of our genome codes for functional molecules such as proteins
and RNA. The intergenic regions, which constitutes 97-98% of the sequence,
to large scale DNA copy number and sequence variants. All are remnants of our
evolutionary past and provide valuable insights about what makes us human.
The human genome contains three billion pairs of nucleotides. The sequence
of the nucleotides that constitutes the DNA strand carries all the genetic information
3
required for the survival of an organism. The gene, which codes for a protein product
biological functions during the development of an individual from a fertilized egg and
throughout life. Recent estimates show that the human nuclear genome contains
polymorphism. It can be categorized on the basis of its size as either a large or small
scale mutation. Large scale mutations can also include abnormalities such as an
scale mutations refer to the alteration in the sequence of the nucleotides. This
includes the replacement of one nucleotide with another, or the deletion, or insertion,
of any of the four nucleotides resulting in a new allele for a particular gene. In some
instances these new alleles may result in disease or improve the fitness of the
organism. In most cases they are neutral changes and do not play any beneficial or
detrimental role.
Any mutation in the germ line DNA sequence is inherited in a stable form and
has the ability to pass from one generation to the next. Mutations can occur either at
the time of recombination during meiosis, when the parental DNA is transmitted to
their progeny or during mitotic cell division that occurs throughout the life time of an
individual. They occur due to errors in DNA replication during cell division. Copying
DNA requires great accuracy for the insertion of the correct nucleotide to the growing
polynucleotide strand. DNA replication enzymes, the DNA polymerases have proof
reading activity that reduces the error rate. The 3`-5` exonuclease activity of these
enzymes removes one incorrect nucleotide at a time from the 3` hydroxyl terminus
until the correct nucleotide appears. Despite these effective DNA proof reading and
4
repair mechanisms replication error occurs at about10-9-10-11 per incorporated
In humans 99.9 % of the genome is identical and only 0.1-2.0% of the DNA
height, facial morphology, skin, eye and hair colour. These variations occur due to
frequencies (usually > 1%) in any given population. To date many types of
polymorphism have been discovered in the coding regions as well as in the non-
coding regions of the human genome and they form the basis of all current genetic
markers. They are used not only to unravel our evolutionary past but to genetically
The non-coding DNA sequences that constitute the bulk of the human
genome are dispersed through out the genome. The exact function of these non-
coding regions remain unknown and this non-genic DNA also known as selfish or
junk DNA.
Several recent findings have shown the dynamic nature of these regions that
play a major role in gene regulation. The junk DNA does not encode for any product
used by the cell. It has a tendency to repeat the sequences many times. In some
instances this interferes with the function of other genes or increases their copy
nucleotide, in the form of an array or a block of bases, scattered through out the
genome.
5
According to their size, the human polymorphisms can be classified as single
or insertions. The base substitutions can be classified into two groups namely
Collins and Jukes (1994) the transition mutation occurs frequently in the mammalian
SNPs are dispersed throughout the genome such as in the promoter region,
single nucleotide polymorphism database the human genome contains more than 55
million SNPs. More than 6 million SNPs lie within genes (Serre and Hodson, 2006).
SNPs were the first generation of polymorphic genetic markers. Their use
was realized in late 1970s with the development of restriction fragment length
polymorphism (RFLP) (Roberts and Murray, 1976). RFLP occurs when a mutation
causes a loss or gain of the recognition site for a restriction enzyme. Restriction
enzymes were discovered in 1968 (Meselson and Yucan, 1968) and they are of three
types designated TYPE I, II and III. Among them TYPE II restriction enzyme are
most useful for genotyping. These restriction endonucleases recognize specific DNA
sequences and cut the DNA within, or near, the recognition sequence. The first
polymorphism in a restriction enzyme site was observed for the human globin
structural gene with the restriction enzyme HpaI (Kan and Dozy, 1978).
6
Since then many SNP genotyping methods such as heteroduplex analysis
detector arrays (Dong et al., 2001; Hacia et al., 1999; Hacia and Collins, 1999;
Marshall and Hodgson, 1998; Qi et al., 2001; Ramsay, 1998; Wang et al., 1998;
Yoshino et al., 2001), high-throughput SNP genotyping (Jenkins and Gibson, 2002,
McClay et al., 2002), and molecular beacon methods (Mhlanga and Malmberg, 2001)
have been discovered to construct high-density SNP maps. More recently massively
genomes and the Thousand Genome Project aims to catalogue SNPs occurring at
In the present century SNPs have become the markers of choice for many
applications in the forensic sciences and medical and evolutionary genetics. The
recent discovery of large numbers of SNPs and the determination of their allelic
biomedical sciences. Studies have identified genetic variation due to SNPs as one of
the factors associated with susceptibility to many common diseases such as heart
disorders, blood pressure (Koschinsky et al., 2001), Type II diabetes (Tsunoda et al.,
The discovery of million of SNPs has greatly aided the field of pharmaco-
genotype. The relationship between the SNPs, disease and medicine are not the
same among various populations or even among the individuals within a population.
Due to the presence of variations in the target genes or drug metabolizing enzymes,
some patients suffering from the same disease exhibit a life-threatening adverse
reaction to a particular medicine while others fail to show any adverse reaction.
Some show intermediate responses for the same drug. The genotype of an
7
individual based upon SNP markers will soon allow the design of different new and
SNPs have also helped in understanding how the modern humans and their
mitochondrial DNA have been used to describe the origins and migrations of our
that occur due to differences in the number of copies of a particular genomic region.
They evolve due to the duplication or deletions of DNA segment ranging several
CNVs were first uncovered among the normal, healthy human individuals
soon after the completion of the human genome project and many studies have
variation, contributing to our uniqueness (Feuk et al., 2005; Hinds et al., 2006; Iafrate
et al., 2004; Sebat et al., 2004; Sharp et al., 2005; Stefansson et al., 2005; Tuzun et
al., 2005). It is estimated that about 12% of the human genome and thousands of
2003; McCarroll et al., 2006; Repping et al., 2006). They have been shown to
influence phenotypic variation, gene expression and gene dosage and are
the copy number of EGFR gene increases risk for non-small cell lung cancer
(Cappuzzo et al., 2005). Another study has demonstrated that the high copy number
et al., 2005). Low copy number of FCGR3B (CD 16 cell surface immunoglobulin
8
receptor) can increase susceptibility to systemic lupus erythematosus and similar
oligonucleotides. This technology has been useful in the detection of new CNVs and
their association with normal and disease phenotypes (Carter, 2007). In the most
complete world wide analyses (Redon et al., 2006) the first-generation CNV map was
hybridization. In this survey a total of 1,447 copy number variable regions (CNVRs),
covering 360 megabases (12% of the genome) were identified in 270 individuals that
had been previously surveyed for SNPs (The International HapMap Consortium,
2005).
SATELLITE DNA:
It is located mainly in the darkly stained region of chromosomes referred to as
heterochromatin. Its exact function is unclear (Csink and Henikoff, 1998; Henikoff et
al., 2001) but transcription is limited in this region and it is thought to play a role in the
structure and function of centromeres (Grimes and Cooke, 1998). It consists of large
blocks of short tandem repeats. Although genotyping these repeats are not easy, it
has been used in human evolutionary studies (Oakey and Tyler-Smith, 1990).
MINI-SATELLITE DNA:
The mini-satellite DNA or the variable number of tandem repeats (VNTR)
(Nakamura et al., 1987) was first identified in the human myoglobin gene (Jeffery et
al., 1985). It consists of intermediate size arrays of short tandem repeats and
thousands of arrays ranging from 0.1-20 kilobases (kb) are found in the euchromatic
9
Most mini-satellites are rich in GC content and clustered towards the ends of
the chromosomes (i.e. telomeres) (Royle et al., 1988). The majority of mini-satellite
DNA is transcriptionally inactive, but in some cases they are expressed for example
heterozygosity values between 70 - 90% (Jeffrey et al., 1985) and their mutation rate
is also higher in comparison to the classical genetic markers (Jeffery et al., 1988). It
is estimated that mutations occurs at a frequency of 1-2% per gamete per generation
resulting in a new variant with a different repeat copy number in individuals and
populations. Baird et al., (1986) were among the first to analyze two VNTR loci,
MICROSATELLITE DNA:
The microsatellites also referred to as short tandem repeat (STRs)
stretches. The term microsatellite was coined by Litt and Luty (1989) and Edward et
STRs are composed of 1-6 base pair repeat units that follow each other in
tandem (Tautz, 1989). Depending upon the number of bases in the repeat unit they
are classified as mono-, di-, tri-, tetra-, penta-, or hexa-nucleotide repeats. The tetra-
nucleotide repeat (GATA) and the array of TG repeats were the first di-nucleotide
STRs identified in human delta and beta globin gene (Miesfield et al., 1981).
(Hamada and Kakunaga, 1982) and several other di-nucleotide repeats (GT or CA)
were described by these groups (Epplen et al., 1982; Hamada et al., 1982)
10
respectively. These repeats are found in the euchromatin region of the chromosomes
STRs constitute about 2% of the human genome and are more frequent than
100,000 in the human genome. Both mini-satellites and STRs can be produced by
the unequal crossing over and by DNA slippage during replication (Kruglyak et al.,
1998; Toth et al., 2000). New STR alleles are thought to arise mostly by the DNA
slippage during replication (Di Rienzo et al., 1994; Jeffrey et al., 1993; Kimmel and
In humans the di-, tri- and tetra-nucleotide repeats are more frequent in
comparison with the large polymorphic repeats. Among all classes of STRs the most
frequent are the di-nucleotide repeats that comprise 0.5% of the genome. They are
highly polymorphic and tend to mutate more rapidly as compared to the tri- and tetra-
nucleotides (Chakraborty et al., 1997; Webster et al., 2002). The motifs of CA/TG
repeats are present at a frequency of 1 per 36 kb whereas the AT/TA motifs are
present at 1 per 50 kb. The less common AG/CT arrays are presents at a frequency
of 1 every 125 kb. The rarest di-nucleotide repeats are CG/GC repeats that are
present at 1 per 10 Mb. Among the tri-nucleotides the most frequently found arrays
are the ACC repeats followed by AGC, ACT and less common ACG.
Genetic variation at STR loci make them very useful genetic markers that
cases (Budowle et al., 1998; 2001; Gill et al., 1994), linkage analysis of disease
(Dietrich et al., 1992; Hearn et al., 1992; Jefferys et al., 1985; Jefferys and Pena,
1993; Queller et al., 1993; Todd et al., 1991) and as a powerful tool for the
investigation of human past and diversity (Bowcock et al., 1994). The multi-allelic
powerful, accurate and informative tool that has aided in reconstructing the
11
evolutionary history of man and exposed the relationship between various world
SNPs. The average mutation rate for tri- and tetra-nucleotide repeats at autosomal
loci is estimated between 7.0 x 10-4 and 9.3 x 10-4 (Zhivotovsky et al., 2000) and for
Y-chromosomal STRs estimates range between 2.4 x 10-3 and 6.9X10-4 per locus,
per generation depending upon whether the mutation rate is observed (Kayser et al.,
Although there is some evidence that the STR loci are neutral in nature and
not involved in any biological function, yet many studies show that some STRs, such
1984). Many of them have binding sites for specific nuclear proteins (Richards et al.,
1986). The tri-nucleotide STR loci are associated with several genetic diseases.
The first such association of the tri-nucleotide motif CCG was reported with fragile X
syndrome (Fu et al., 1991; Kremer et al., 1991; Verker et al., 1991). In normal
52 to 1000 repeats. The meiotic instability of these repeats are associated with over
a dozen of human diseases such as, X-linked spinal and bulbar muscular atrophy
SBMA (La Spada et al., 1991), myotonic dystrophy (Brook et al., 1992; Fu et al.,
1992).
TRANSPOSABLE ELEMENTS:
The other class of repetitive DNA includes the interspersed repetitive non-
coding DNA that occupies 45% of the human genome (International Human Genome
12
also been linked with certain diseases. These are derived from mobile DNA
sequences, also called transposable elements (Prak and Haig, 2000; Smith, 1999).
These elements have an ability to migrate from one region of the human genome and
integrate into another region (Prak and Haig, 2000; Smith, 1996). Until now there is
D) DNA transposons.
Depending upon the transposition mechanism these four groups are broadly
1) Retrotransposons or retroposons:
2) DNA transposons.
Retro transposons are transposable elements that make their copies through
reverse transcriptase and include LINEs, SINEs and LTRs. Cellular reverse-
transcriptases transcribe mRNA into neutral cDNA which is then integrated in any
In DNA transposons the DNA sequences are excised and directly integrated
into another place of the genome by a cut and paste mechanism. DNA transposons
accounts for 3% of the human genome and virtually all human DNA transposons are
The most successful and ancient transposable elements are the LINES.
These elements first appeared in the eukaryotic genomes about 600 million years
ago (Malik et al., 1999) and collectively comprises about 21% of the human genome.
These elements are sub divided into three distantly related families LINES 1, LINES
2, and LINES 3. In comparison with LINES 2 and LINES 3 elements, the LINE 1
13
element is the only family, which is still being actively transposed (International
fragments reside in the human genome and make up 17% of the genome. (Lander et
al., 2001; Smith, 1996). These elements are mostly found in AT rich regions
(Kongberg and Rykowski, 1988). The LINE 1 element consists of two open reading
frames ORF1 and ORF2. ORF1 encodes a 40 kilo Dalton (kDa) RNA-binding protein
while ORF2 encodes 150 kDa protein, which have both endonuclease and reverse
transciptase activity (Feng et al., 1996; Mathias et al., 1991). The LINE 1 transcript
moves from the nucleus to the cytoplasm where it is translated to yield ORF proteins.
The LINE1 RNA assembles with its own encoded proteins and re-enters the nucleus,
where the L1 endonuclease cleaves one strand of DNA preferably at the 5`-TTTT.A-
3`consensus site (Cost and Boeke, 1998; Feng et al., 1996; Jurka, 1997; Morrish et
al., 2002) and the reverse transcriptase uses the same site to prime reverse
transcription from the 3` end of the LINE RNA. At the time of integration, in most
In the human genome about 99.8% copies of the LINE1 elements present are
defective (Gilbert et al., 2002; Kazazian and Moran,1998; Myers et al., 2002;
Ostertag and Kazazian, 2001; Sassaman et al., 1997) with an average size of 900 bp
still functional and produce new copies (Sassaman et al., 1997). At least 1 in every
50 humans has a new genomic L1 insertion. These occur in the parental germ cell or
during early embryonic development (Goodier et al., 2001; Luningprak et al., 2003;
Ostertag et al., 2002). The functional significance of this occurrence is unknown but
these new copies can be used as genetic markers such as the L1 insertion in the
14
2000). Some times these insertions can lead to disease as in the case of hemophilia
SINES comprises 13% of the human genome. These sequences are 100-
400 bp long and include the Alu repeats which are dispersed throughout the human
genome. Unlike LINE elements they do not encode any protein and use the LINE
machinery for their transposition (Kajikawa and Okada, 2002). All, except one, of the
families of SINE elements originated from tRNA. The only exception is the Alu family
which originated from signal recognition particle component (SRP 7SL) RNA (Ullu
The Alu elements are about 300 bp long and they constitute 10.7 % of the
human genome. The Alu insertion has been postulated to have occurred early in
primate evolution, about 30-65 million years ago (mya) (Batzer et al., 2002; Deininger
et al., 1992; Deininger and Daniels, 1986; Deininger and Slagel, 1988; Kapitoov,
1996; Labuda et al., 1991; Shen et al., 1991). A subfamily of these Alu repeats
termed as human specific (HS) repeats (Batzer et al., 1990) appeared in the human
genome record within the last 6 million years (Batzer et al., 1991; Batzer and
Deininger, 1991). Approximately 75% of these HS repeats are present in all human
populations indicating that they were inserted early in human evolution and were
completely fixed before the migration of humans from Africa (Deininger et al., 1999).
Alu repeats have also proven to be extremely useful genetic markers (Myers
et al., 2002; Watkins et al., 2001). About 25% (400 sites) of these recent Alu
are associated with human diseases such as hypertension (Barley et al., 1996; Duru
et al., 1994; Jeng et al., 1997), myocardial infarction (Ludwing et al., 1995),
1993).
15
In the human genome 8.5% of repetitive DNA belongs to LTR which
This human ERV (HERV) contains many sub-families and shows a small number of
polymorphism (Turner et al., 2001). Many of the LTRs are defective and
transposition has been rare. The non-autonomous element of LTR consists of the
MaLR family accounts about 3.8% of human genome. This family lacks the pol gene
Over the past decade the genetic variation of these DNA based markers has
been exploited to unravel the paternal and maternal lineages and the relationship
et al., 1999 a, b and c). The current study was designed to use polymorphic markers
to uncover the genetic history of ethnic groups residing in present day Pakistan and
provide basis for further analyses of these populations in genetic association and
The modern state of Pakistan was established on August 14, 1947, but the
throughout human history. The country lies on the postulated southern coastal route
The earliest evidence indicates that humans were present in this region
around 100,000 -150,000 years ago but the fossil record is non-existent. Neolithic
sites have been found in the Peshawar Valley in the north-west and at Mehrgarh, in
the south-east in the province of Baluchistan (Jarrige, 1991). The evidence found at
Mehrgarh indicates a modern human settlement dating to around 7,000 B.C. This
predates the region's other earliest civilizations, the Indus Valley civilizations found
16
throughout the sub-continent with major centres at Harappa and Mohenjo-Daro in
Pakistan. This civilization flourished in the 3rd and 2nd millennia B.C. (2,500-1,500
B.C.).
Due to its geostrategic importance as the gateway to India this region was
invaded many times. Around 1,500 B.C. the Indo-European speaking nomadic
pastoral tribes, the so-called Aryans, entered this region through the Hindu Kush
speakers who were thought to be there initially. Their rule lasted from about 1,500
B.C.500 B.C. when this region was occupied by the Persian Empire. In 326 B. C.
this region was conquered by Alexander the Great. Subsequently it was conquered
by the Mauryas (305 B.C.), Saka (97 B.C.), Arabs (711 A.D.), Turks (1001), Mughals
India and Pakistan house many different races and languages and are often
referred to as "a museum of races." Present day Pakistan has a population of over
170 million (Pakistan Economic Survey, 2006-2007) and consists of more than 12
ethnic and linguistic groups, the majority being descendants of the invader stocks.
Ethnic groups from the southern part of Pakistan include Baloch, Brahui, Makrani
Baloch, Makrani Negroid, Parsi and Sindhi. Major populations represented by the
northern groups include Balti, Burusho, Kalash, Kashmiri, Pathan and Punjabis. The
latter form the majority population of this country and include several castes.
STUDY OBJECTIVE:
The main objective of the study is to shed light on the population histories of
numerous ethnic groups living in modern day Pakistan. Earlier studies used a only
and since then many more informative Y-SNPs have been discovered (Karafet et al.,
17
2008) which have not been typed in this population. Another caveat of the earlier
work was the lack of samples from the Punjab which constitutes the majority
understand population origins and substructure and unravel the influence of Central
Asia, China, Greece and Persia on this population. Statistical analyses and
is my hope that these analyses will improve our knowledge of group membership
within Pakistan that will have practical applications in DNA based human forensic
18
LITERATURE REVIEW
3
PAKISTAN AND ITS POPULATIONS
Pakistan lies in a region that has seen the passage of many invaders and all
have contributed to the racial and linguistic diversity found in this country. It is
bordered by China in the north, India in the east, Iran and Afghanistan on the west
and the Indian Ocean straddle the southern coast line. The Pakistani population
Economic Survey, 2006-2007) but the World Health Organization estimates the
Pakistan consists of four provinces, the northern areas and the Federally
Administered Tribal Areas (FATA) which are located on the Afghan frontier. More
than 18 ethnic and 60 linguistic groups (Grimes, 1992) reside in this country. Major
ethic groups include Baloch, Brahui, Pathans, Punjabis and Sindhis. The majority
Punjabi speaking populations show a great and complex admixture of many ethnic
caste and groups (Ibbetson, 1883) such as the Gujar, Jats, Meos, Rajput and Arians
etc. Other ethnic groups that are of anthropological interest include the Makrani-
Negroid, Mohanna and Parsi in the south and Balti, Burusho, Kalash and Kashmiri in
the north. Of particular interest are the Hazara population which resides in
Baluchistan and the North West Frontier Province (N.W.F.P.). The geographic
locations of the above mentioned Pakistani population are shown in Figure I and their
19
Figure I. Map of Pakistan showing its neighbours, administrative regions and
the geographical distribution of the populations that are included in this study.
20
Table I: The possible origins and language affinities of Pakistani populations.
North
South
21
Three major Pakistani populations: the Baloch, Brahui and Makrani reside in
the province of Baluchistan and constitute the southern group. Historians believe
that the Baloch migrated from West Asia to South Asia. They claim that they are of
Semitic stock and that between 1 and 2 millennium B.C. their homeland was the
ancient region of Nineveh and Babylon in modern day Iraq. From there they
migrated to Iran, Afghanistan and Pakistan. Many Baloch tribesmen reside in south-
east Iran as well. Some historians also claim that they came from Aleppo in Syria in
682 A.D. (Quddus, 1990) when at least 44 tribes migrated to Iran. Their movement
moved from Iran and occupied Sistan and as a result of Seljuq invasion they settled
on land of Makran. In the fifteen century they migrated eastward and settled in
Kachi. Now they occupy the area of Sibi and the Loralai District of Quetta Division in
migrated from west and central Asia and settled in the Sarawan and Jhalawan
regions of Kalat State in Baluchistan (Hughes-Buller, 1991; Quddus, 1990). They are
The southwestern dry and arid Makran coast of Pakistan is home to two
Makrani-Baloch expresses linguistic and ethnic affiliation with the neighboring Baloch
tribes (Grimes, 1992). However, many Makrani have Negroid features and are
provincial capital, Quetta, are the Hazara. The name Hazara is derived from the
Persian word meaning thousand. This population is also found in the town of
Parachinar in the NWFP and widespread in Afghanistan. They have typical Mongol
features and claim descent from a detachment of thousand soldiers left behind by
22
Genghis Khan during his invasion of India. Historical records show that they settled
The other populations from southern Pakistan include the Sindhi, Mohanna
and Parsi all of whom reside in the south eastern province of Sindh. The Sindh
province is referred to in several ancient texts ___ Sindomana by the Greek and
Parthians, Brahmans, Arabs, and finally by the British and Mohenjo-Daro, the jewel
of the Indus Valley Civilization, is located here. As a result of multiple invasions and
fishermen who have been residing on the banks of the River Indus for centuries.
The suggested origin of the Parsis is in Persia (Nanavutty, 1997). They are
the followers of the Iranian prophet Zoroaster, migrated from Iran to the state of
Gujrat in northwest India in 7th century A.D. after the collapse of the Sassanian
Empire. Many Parsis eventually settled in Mumbai in India and Karachi in Pakistan,
reside in the North West Frontier Province (N.W.F.P) and its adjoining tribal areas.
They also inhabit the southern and eastern part of Afghanistan and Baluchistan
province of Pakistan. They are also known as Pushtuns, Pakhtuns or Afghans and
descendants of soldiers who came with Alexander the Great and several historical
Among them are the Balti, Burusho and Kalsh. Baltis speak a Sino-Tibetan language
and their suggested origin is in Tibet (Dani, 1991). They reside in Baltistan, the north
23
eastern Himalayan region of Pakistan.
The Burusho, one of the isolated northern populations, also believe that they
are the descendants of Greek generals who came to the subcontinent with Alexander
the Great in 327-323 B.C. (Biddulph, 1977). They reside in Hunza, Nagar and Yasin
Valleys in the Karakorum Mountains and are the only population in Pakistan who
The Kalash also claim descent from Greek Macedonia citing Alexanders
invasion of the subcontinent. They reside in the valleys of Bumburet, Rambur, and
Birir near Chitral in the Hindu Kush Mountain ranges in the NWFP. They have been
extensively studied by anthropologists for their unique culture and traditions (Lines,
1999).
migration, and colonization (Lahr and Foley, 1994). Studies reveal that human
history can be deciphered from the analyses of the human genome. The genomic
structure.
At the beginning of 20th century data obtained from protein markers led to
2005). However, in recent years DNA based markers have proved to be more
informative DNA marker should be both highly polymorphic and selectively neutral.
DNA markers on the non-recombinant portion of the human Y chromosome and the
mitochondrial DNA are polymorphic markers that have been successfully applied to
shed light on human evolutionary history from the male and female perspective,
respectively.
24
Y CHROMOSOMAL VARIATIONS
et al., 1985; Lucotte and Ngo, 1985). Since then more than 600 binary
polymorphisms, the majority of them being SNPs, and numerous multi-allelic STR
markers have been identified on the human Y-chromosome (Karafet et al., 2008).
allows their organization in the form of a phylogenetic tree that shows relationships
have led to the development of a standardized nomenclature system for such a tree.
The initial tree based upon approximately 200 markers (Jobling and Tyler-Smith,
2003; Y Chromosome Consortium, 2002) was recently revised to identify 311 distinct
Y haplogroups (Karafet et al., 2008). The phylogenetic tree is rooted with respect to
clades designated AT (figure II). Karafet et al., (2008), refer to these as paragroups
in order to differentiate them from the 311 haplogroups that are identified by terminal
identified by STRs are designated as haplotypes, and those that are defined by the
Knijff (2000). A brief description of the salient features of major Y haplogroup clades
follows:
HAPLOGROUP A:
restricted to Africa (Hammer et al., 2001; Underhill et al., 2001). All individuals that
25
26
fall in this group carry the ancestral state for M42, M94, M139, and SRY10831.1 and
derived state for M91 and P97 (Karafet et al., 2008). The M91 lineage is sub divided
into three main haplogroup characterized by derived alleles for the markers P108, M6
and M32. These haplogroup have been mainly observed in the Khoisan and Bantu
speakers from South Africa, Pygmies from Central Africa and in the Sudanese,
Ethiopian and Mali populations of East Africa (Hammer et al., 2001; Semino et al.,
HAPLOGROUP B:
SNP. They are also derived for the markers M42, M94 and M139. All 17 branches
of clade B* are frequently found in sub-Saharan Africa. The major sub-clades are
B1* defined by M236 and B2* define by M182 haplogroup. Sub-clade B1a defined
by the M146 marker is mainly found in Mali. The B2* cluster has several
haplogroups one of which is derived for the marker, B2a*- M150, and is frequently
observed in East Africa (Sudan and Ethiopia). The B2b* (M112 or M192 derived Y
chromosomes) are found in Pygmies from central and southern Africa (Cruciani et
al., 2002; Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Semino et al., 2002;
chromosomes spread very early within the African continent and is supported by the
north and south of the Sahara Desert, eventually reaching the Levant about 130,000-
HAPLOGROUP C:
clade. It is defined by five mutations, the hallmark being the synonymous RPS4Y711
27
C to T transition (also referred to as M130) in the exon of the RPS4Y gene that was
among one of the earlier Y chromosomal polymorphisms that were identified (Fisher
et al., 1990). This clade has not been found in sub-Saharan Africa and the mutations
defining this haplogroup probably occurred in Asia after the migration of modern
humans out of Africa. Walter et al., (2000) has suggested that this mutation
originated in south Asia about forty to fifty thousand years ago with the dispersal of
modern humans from the Horn of Africa via a coastal or interior route towards south
Asia. The haplogroup is frequent in populations from Central and East Asia. It is
also found in many indigenous Australasian and Polynesian populations and the
Native American Indian tribes (Capelli et al., 2001; Hammer et al., 2001; Hudjashov
et al., 2007, Karafet et al., 2001; Kayser et al., 2006; Ke et al., 2001; Kivisild et al.,
2003; Scheinfeldt et al., 2006; Underhill et al., 2001; Zegura et al., 2004).
HAPLOGROUPS D and E:
populations mainly in Japan and Tibet (Su et al., 2000; Karafet et al., 2001). The 15
haplogroups that are part of this clade are all characterized by the presence of M174
T to C transition (Underhill et al., 2000). These are scattered throughout south East
Asia and among Andaman Islanders (Hammer et al., 2006; Thangaraj et al., 2003).
found in Africa, Levant, Europe, Central and South Asia (Hammer et al., 1998;
Underhill et al., 2001). Clade E* haplogroups are derived for several markers
including M96 and SRY4064. The major sub-clades are E1* and E2* that are
characterized by derived alleles for P147 and M75. The topology and nomenclature
of this branch has been recently revised with the discovery of several novel
28
polymorphism and accounts for 80% of clade E haplogroup. M2 or sY81 derived
the E1b1b* haplogroups defined by the M215 mutation are frequently observed in
north and east Africa, the Mediterranean basin and the Europe (Hammer et al.,
1997). It has been suggested that clade E* haplogroup were spread by the Bantu
farmers during the Neolithic period (Passarino et al., 1998; Scozzari et al., 1999).
The representatives of these haplogroup traveled from the Middle East to southern
Europe and northern India and Pakistan (Cruciani et al., 2002; Hammer et al., 1998;
HAPLOGROUP F:
M168 derived haplogroup that have the derived allele for M89 C to T
et al., 2000) this clade is now also identified by several markers discovered by
Hammer et al., (2001). The haplogroup probably arose in East Africa about 45,000
years ago and dispersed to Eurasia through the Levantine corridor. Underhill et al.,
(2001) have suggested that the African ancestors first migrated to the Middle East
around 40,000 years ago and eventually expanded towards the west, east and north
giving rise to several major clades (GT) of the Y phylogenetic tree. Paragroup F* is
found mainly on the Indian subcontinent and in Sri-Lanka (Kivisild et al., 2003;
HAPLOGROUP G:
South East Europe, the Mediterranean region, Anatolia, West and Central Asia
(Behar et al., 2004; Cinnioglu et al., 2004; Jobling and Tyler-Smith, 2003; Regueiro et
al., 2006; Sengupta et al., 2006) and North Caucasus (Nasidze et al., 2003).
29
HAPLOGROUP H:
within this clade are separated into two major clusters: H1* and H2*. H1* clade is
defined by the M52 A-C transversion whereas the H2* haplogroup is characterized
by the Apt G to A transition. Both have been observed in India but only H1* has
been reported in populations from Pakistan (Jobling and Tyler-Smith, 2003; Karafet
HAPLOGROUP I:
initially by the M170 A-C transversion. It is thought that this mutation was acquired
during the early expansion of Levantine populations towards the west. Clade I
(Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Rootsi et al., 2004).
HAPLOGROUP J:
One of the major clades that defined by the 12f2a and more recently the
M304 deletion and P209 marker (Karafet et al. 2008). It has two main branches J1*
which is M267 derived and J2* which is derived for M172 (Cinnioglu et al., 2004;
Underhill et al., 2000). The J* clade and its branches probably arose in the Middle
East and Anatolia (Turkey) from where they spread to west Asia and Eurasia
(Hammer et al., 2000; Semino et al., 2004). It is frequent in both India and Pakistan
30
HAPLOGROUP K:
This haplogroup is a mixed bag characterized by derived alleles for the M9 (C-G
transversion) marker (Underhill et al., 1997). Its low incidence in Africa illustrates
that the mutation occurred after the migration out of Africa. A recent survey by
Karafet et al., (2008) demonstrated derived states for an additional three markers
(P128, P131 and P132) for this haplogroup. The K1 branch derived for M147 has
been observed in populations from the Indo-Pak subcontinent (Underhill et al., 2001).
HAPLOGROUP L:
The L* lineage probably arose in West Asia in a pre-Holocene era and was
initially identified in samples from the Indus Basin in Pakistan (Underhill et al., 2000).
One branch L1 (derived for M27 and M76) probably arose in the Indo-Pak
India and the Northern region of India and Pakistan. The highest frequency at South
India and South-West Pakistan suggests that this lineage originated in the Indian
the Middle East, Central Asia, Northern Africa, and Europe and along the
Mediterranean coast (Cinnioglu et al., 2004; Cruciani et al., 2002; Jobling and Tyler-
HAPLOGROUP M:
Indonesia, Melanesia, Papua and New Guinea (Capelli et al., 2001; Hurles et al.,
2002; Kayser et al., 2006; Scheinfeldt et al., 2006; Su et al., 2000). Currently 20
mutations characterize the 12 haplogroups found within this branch (Karafet et al.,
2008).
31
HAPLOGROUPS N and O:
haplogroups clades N* and O*. M231 and LLY22g characterize clade N* and N1*
and the M175 deletion clade O* (Cinnioglu et al., 2004). Haplogroup N* probably
originated in Asia but are now predominantly found in European populations (Karafet
clade is characterized by the Y-SNP O3*-M122 and it predominates in East Asia and
2005).
HAPLOGROUPS P, Q and R:
Clade P* is defined by the presence of 92R7, M45, M74 and several other
SNPs that are derived for the M9 mutation as well. This clade includes several major
Central Asia from where these chromosomes spread throughout the world (Seielstad
et al., 2003). These Y chromosomes are found at high frequency in North Eurasia
and Siberia (Karafet et al., 2002) and at lower frequencies in Europe, East Asia and
the Middle East. One major branch of this haplogroup (Q1a3a*-M3) is almost
Eight mutations, including the M207 A-G SNP, represent clade R. This clade
that around 30,000 years ago the R*-M207 mutation expanded westwards to reach
Europe, Caucasus, Middle East, Central Asia, northern India and Pakistan. The R1*
haplogroup is one of the most common in Europe and west Asia and probably
32
originated in central Asia. The R1a1*-M17 clade that is characterized by deletion of
HAPLOGROUPS S and T:
Clade S* lineages are also identified by P202 and P204 markers and are found in
Oceania and Indonesia (Kayser et al., 2006; Scheinfeldt et al., 2006). Clade T* that
is also characterized by M70, M193 and M272 is further delineated by M320 and P77
and has been observed in the Middle East, Africa, and Europe (Underhill et al., 2001;
33
MATERIALS AND METHODS
-4-
COLLECTION OF SAMPLES:
For this study, the blood samples were collected from1213 unrelated male
consent was obtained from all participants included in this study. Ethnicity of the
Dickinson, Mountain View, CA.). 66 samples belong to Baloch and 117 samples
from Brahui population were collected from Quetta and Kalat Division in Baluchistan.
97 samples belong to Burusho population were collected from Hunza and Nagar in
the Northern Areas. 224 Hazara samples were collected from the area of Parachinar
and Quetta. 44 Kalash samples were collected from Chitral Division. The 90 blood
samples of Parsis and 14 Balti samples were collected from Karachi. 96 Pathan
samples were collected from the North-West Frontier Province. 138 Sindhi samples
were collected from the Sukkur in Sindh. 16 samples of Meos, 10 Rajput and 159
Gujar samples were collected from the rural areas of Punjab Province. 27 Makrani-
The 77 Greek DNA samples were provided by Dr. Myrto Papaioannou (Unit of
Greece).
The Epstein-Barr Virus (EBV) producing B95-8 marmoset cell line (American
supplemented with 1% fetal calf serum (FCS; Biochrom AG, Berlin, Germany) and
34
(International Equipment Company, Needham, MA), at 1000 rpm (300g) for 10
minutes. The supernatant was decanted and the pellet was washed twice in 5 ml of
wash medium followed by centrifugation at 1000 rpm for 10 min. The cells were
transferred into a 25 cm2 culture flask (Corning, Corning, NY) containing RPMI-1640
medium supplemented with 1X GPPS and 10% FCS. The flask was incubated at
37 C in a humidified atmosphere of 93% air and 7% CO2. The culture was gradually
expanded and split first into a 75 cm2 and finally in 125 cm2 flasks. When the
medium in the culture flask became yellow they were incubated at 34 C without any
8th day the cell pellet was removed from the suspension by centrifugation at 1000
rpm for 10 minutes. The supernatant containing EBV was filtered through a 0.45 M
Millipore membrane filter (Nilsson, 1976). The EBV supernatant was aliquoted into
cryovials (Corning, Corning, NY) and stored at70 C until use. 1 ml aliquot of this
PREPARATION OF LYMPHOCYTES:
vacutainer tubes (Becton Dickinson, San Jose, CA). The blood was layered over 3ml
(Corning, Corning, NY). Each sample was centrifuged at 2000 rpm (400g) for 20
minutes. The upper plasma layer was aspirated and PBMC were collected from the
interface between the plasma and Histopaque and transferred in to another sterile 15
ml tube containing 10 ml wash medium and centrifuged at 1000 rpm for 10 minutes.
The supernatant was decanted and the cell pellet washed twice with 5 ml wash
medium and resuspended in 1 ml of wash medium (Boyum, 1968). Cell viability was
35
CELL COUNTING BY TRYPAN BLUE EXCLUSION TEST:
Cell viability was calculated by the trypan blue exclusion test as described by
Kruse, (1973). An equal volume (10 l) of cell suspension was mixed with 0.16%
(w/v) trypan blue solution in physiological saline. Cells were counted using a
haemocytometer. Unstained live and blue stained dead cells were counted in the
central 1mm square of the counting chamber. The cell viability was calculated by the
following formula:
LINES:
human lymphoblastoid cell lines were established. Approximately 4-5 x 106 PBMCs
cyclosporin A) and 1 ml EBV supernatant prepared earlier. The flask was incubated
at 37 C in a humidified atmosphere of 93% air and 7% CO2, keeping the cap of flask
slightly loose (Walls and Crawford, 1987). The culture was visualized periodically
under an inverted microscope. After 5-6 days when colonies formed and the culture
medium became acidic, the culture was fed with feeding medium (RPMI-1640, 10-
15% FCS and 1X GPPS). When the transformed cell density in a culture flask had
suitably increased, half of the culture was transferred into a 75cm2 culture flask and
36
CRYOPRESERVATION OF CELL LINES:
For cryogenic preservation, cell viability was checked by the trypan blue
exclusion test as described earlier. Only cultures with cell viability > 90% were
frozen. The volume of cell suspension containing 5 x 106 cells was centrifuged at
1000 rpm for 10 minutes. The supernatant was decanted and the cell pellet was
vial. The vial was kept in a polystyrene box at -70 C overnight so that the
temperature decreased gradually. The following day the vial was transferred to the
vapour phase of the liquid nitrogen cryo-storage system (Jencons, Leighton Buzzard,
For the isolation of total genomic DNA a modified organic method was used
the cell pellet 19 ml STE buffer (100 mM sodium chloride, 50 mM Tris and 1 mM
EDTA; pH 8.0) was added. Next 1 ml of 10% sodium dodecyl sulphate (SDS) was
The samples were incubated overnight in shaking water bath at 55 C and extracted
the following day with an equal volume of tris base equilibrated phenol (pH 8.0). The
samples were mixed for 10 minutes, placed on ice for 10 minutes and then
centrifuged in MSE 3000i (Mistral, UK) at 4 C for 40 minutes at 3200 rpm. The
aqueous layer containing the nucleic acid was collected in a fresh, labeled 50 ml
centrifuge tube. The next extraction was done by adding an equal volume of chilled
24:1 (v/v) Chloroform: isoamyl alcohol. The samples were mixed and the aqueous
layer was collected in a fresh 50 ml tube. For precipitation of nucleic acids, 1/10
37
added and mixed until white precipitates formed. These samples were stored over
night at -20 C or at -70 C for 15-20 minutes. Samples were then centrifuged at 3200
rpm for 90 minutes to pellet the nucleic acid and the pellet was washed with 5 ml of
chilled 70% ethanol. The pellets were vacuum dried for 10 minutes. To the pellets,
1ml Tris-EDTA (TE; 10 mM tris, 1 mM EDTA; pH 8) was added and the samples
were incubated at 37 C for 1 hour to resuspend the pellets. To digest the RNA, 10 l
of RNase A (10mg/ml) was added to the samples and they were incubated at 37 C
for 2 hours in a shaking water bath. The RNase was subsequently removed by
in a shaking water bath. At this point the samples could be stored at 4 C till further
and an equal volume of chilled isopropanol was added. The samples were mixed
until the DNA was seen and stored at -20 C overnight or at -70 C for 15-20 minutes.
DNA was pelleted and washed with 5 ml of 70% chilled ethanol. The pellet was
vacuum dried for 10 minutes and the DNA was resuspended in 1 ml of 10 mM Tris-
The optical density (OD) of the samples was measured at 260nm and 280nm (ideally
A dilution factor of 50 was usually employed and the correction factor for double
stranded DNA is 50. If the OD260/OD280 ratio was 1.7-2.0, DNA was considered pure
and free of contaminating phenol or proteins and for further analysis. Each sample
was kept in an appropriately labeled microcentrifuge tube and stored at 4oC until use.
Some DNA samples were also directly prepared from the blood sample. The
procedure for the extraction of the DNA from blood was the same as above with
38
some minor modifications. Initially the blood was mixed with the cell lysis buffer
EDTA; pH 8.0) and kept on ice for 30 minutes. The samples were centrifuged for 10
minutes at 1200 rpm. The pellets were again washed with 10 ml of lysis buffer and
centrifuged for 10 minutes at 1200 rpm. To this pellet 4.75 ml of STE buffer was
added along with 250 l of 10% SDS (drop wise with gentle vortexing) followed by 10
l of proteinase K. The tube was incubated overnight in a rotary water bath at 55oC.
The next day, the samples were extracted using phenol and chloroform: isoamyl
alcohol as described earlier. After this first extraction, 10 l of RNAse A (10 mg/ml)
was added and the samples were incubated at 37oC for 2 hours. After 2 hours the
samples were again treated with 250 l of 10% SDS and proteinase K and incubated
at 55oC for 1 hour. Subsequent extraction and precipitation were the same as
PHENOL EQUILIBRATION:
200-500 ml distilled phenol were stored at -20C. Before use, the phenol was melted
final concentration of 0.1% (w/v). The melted phenol was extracted once with an
equal volume of 1.0 M Tris buffer (pH 8.0) and 3 to 4 times with 0.1 M Tris (pH 8.0).
This equilibrated phenol was stored at 4C in equilibration buffer (0.1 M Tris) to which
0.2% -merceptoethanol (v/v) was added. Under these conditions it was stable for
39
GENOTYPING OF Y MARKERS BY POLYMERASE CHAIN REACTION
(PCR):
Polymerase chain reaction was first described in 1985 (Saiki et al., 1985) and
the method was extensively employed in this study to amplify the desired fragment of
Y chromosome from genomic DNA. The 93 Y markers that were genotyped in this
study are shown in table II and a brief overview of the various methods used to
The ARMS PCR technique is a simple method for the detection of single base
mutations. In this allele specific PCR the genomic DNA is only amplified when a
specific allele is present. Two sets of reactions are run in parallel using three types
of primers, one of which is common in both reactions. One set consists of the
common primer and a primer that is specific for the normal sequence. The other
contains the common primer and another that is specific for the mutant sequence.
The AFLP PCR is based on the principle that the base changes results in the
creation or abolition of a restriction site. PCR primers are designed from sequences
flanking the restriction site to produce a 100-500 base pair product. The amplified
fragments are analyzed by agarose gel electrophoresis. The SNPs typed by AFLP
40
Table II: A list of Y haplogroups, markers, type of polymorphism and genotyping methods used in this study. Y haplogroups were
determined in a hierarchal manner, screening initially with markers that identified deep lineages (bold) and subsequently genotyping
markers that further delineated the tree in the target population. The typing methods were amplified fragment length polymorphism
(AFLP), denaturing high performance liquid chromatography (DHPLC), amplification refractory mutation system polymerase chain reaction
Polymorphism
Polymorphism
Polymorphism
Haplogroup
Haplogroup
Haplogroup
Genotyping
Genotyping
Genotyping
Markers
Markers
Markers
Method
Method
Method
A M91 del T DHPLC H1b M97 TG DHPLC O1b M110 TC Seq
A1 M31 GC DHPLC H2 Apt GA AFLP O2 P31 TC Seq
A2 M6 TC DHPLC I M170 AC ARMS O2a1 M88 AG Seq
A2 PK1 CA AFLP J 12f2 del PCR O2a1 M111 del TT Seq
A3a M32 TC DHPLC J1 M267 TG ARMS O2a1a PK4 AT DHPLC
B M60 ins T DHPLC J1a M62 TC ARMS O2b SRY+465 CT AFLP
B2a M150 CT DHPLC J2 M172 TG ARMS O3 M122 TC ARMS
B2a1 M109 CT DHPLC J2a1b M67 AT ARMS O3a3 L1Y LINE1 ins PCR
B2a1 M152 CT DHPLC J2a1b1 M92 TC ARMS O3a5 M134 del G DHPLC
B2a1 M218 CT DHPLC J2b M12 GT ARMS O3a5a M117 del ATCT DHPLC
C RPS4Y CT AFLP K M9 CG AFLP O3a5a M133 del T DHPLC
C1 M8 GT Seq K1 M147 ins T Seq P 92R7 CT AFLP
C2 M38 TG Seq K4 M177 CT Seq P M45 GA DHPLC
C3 M217 AC Seq L M20 AG AFLP P M74 GA DHPLC
C3 PK2 T-C ARMS L M11 AG AFLP Q M242 CT ARMS
C3C M48 AG ARMS L M185 CT DHPLC Q2 M25 GC DHPLC
DE YAP Alu ins PCR L1 M27 CG ARMS Q2 M143 GT DHPLC
E SRY-8299 GA AFLP L1 M76 TG DHPLC R M207 AG ARMS
E3a sY81 AG AFLP L2 M317 del GA DHPLC R1 M173 AC ARMS
E3b1 M35 GC ARMS L2 M349 GT DHPLC R1a1 M17 del G ARMS
E3b1a M78 CT ARMS L3 M357 CA DHPLC R1a1 SRY-1532 AGA AFLP
E3b1a1 M148 AG DHPLC L3a PK3 TC ARMS R1a1a M56 AT ARMS
E3b1c M123 GA ARMS NO M214 AG ARMS R1a1b M157 AC DHPLC
E3b1c2 M136 CT DHPLC N LLY22g CA AFLP R1a1c M87 TC DHPLC
F M89 CT ARMS N M231 GA DHPLC R1a1d PK5 CT AFLP
G M201 GT ARMS N3 TAT TC AFLP R1b2 M73 del GT DHPLC
G2a P15 C-T DHPLC O M175 del TTCTC Seq R1b3F SRY-2627 CT AFLP
H M69 TC DHPLC O1 M119 AC DHPLC R1c M343 CA ARMS
H1 M52 AC ARMS O1a M101 CT DHPLC R2 M124 CT ARMS
H1 M82 del AT DHPLC O1b M50 TC DHPLC T M70 AC ARMS
H1a M36 TG DHPLC O1b M103 CT DHPLC T M193 ins CAAA DHPLC
41
Table III: List of SNPs typed by AFLP method.
2 Lly22g HindIII
3 M9 Hinf I
4 M11 Msp I
6 M20 Ssp I
7 PK1 Psp14061
8 PK5 Mnl1
9 RPS4Y Bsl I
10 SRY+465 FnuH I
12 SRY2627 Ban I
13 SRY8299 BsrBI
15 TAT Mae II
42
PREPARATION OF AGAROSE GEL:
was mixed in 300 ml of or TAE electrophoresis buffer (0.04M Tris-acetate and 0.01 M
EDTA / liter) to make a 2% (W/V) agarose gel. The agarose was melted in a
microwave oven keeping the cap of the bottle loose. When the agarose was dissolved
added and mixed thoroughly. The gel was placed on shaking water bath at 55 C for
20-25 minutes. A gel tray was sealed with rubber clamps and placed on a level
horizontal surface. The required combs were placed at appropriate positions (0.5-
1.0mm above the base of the gel). The gel was poured into the gel tray. After the gel
solidified, the combs and clamps were removed from the gel tray. The gel was placed
Orange G loading dye (0.125% orange G, 20% Ficoll, 100mM EDTA) was
added to each sample and the samples were loaded on the gel. A 100 bp ladder
(Promega) was loaded in the first well. Electrophoresis was carried out for
approximately 40 minutes at 150 volts using a power pack (3000 Bio Rad
MULTIPLEX PCR:
primer pairs which were labeled either with TET, HEX or FAM (Table IV). The multiplex
PCR assay was performed in a 10 l final volume. The reaction mixture was prepared
TM
in two steps. In first step, Super Taq polymerase / Taq Start Antibody premix was
prepared. Briefly, the premix consisted of the following: 0.13U Super Taq enzyme (HT
TM
Biotechnology Ltd) was incubated with 2.3 M Taq Start Antibody (Clontech) in the
43
TM
presence of 0.874 l /RXN Taq Start Dilution buffer for 5-7 minutes at room-
temperature. In the second step, PCR master mix was prepared. Briefly the reaction
Mgcl2, 50mM KCl, 0.01% gelatin and 0.01% Triton X-100), 0.7mM Mgcl2, 200 M
dNTPs, primer (concentration was described in table IV) and 1.225 l /RXN Super Taq
The above mixture was added in to the tubes containing 20ng (1l) genomic
DNA. PCR was performed by Touch Down protocol as described in Ayub et al.,
(2000). PCR was carried out using the following conditions: 1 cycle of 1 minute at
940C; 8 cycles of 1 minute at 940C, 1 minute at 600C and 1 minute 720C (the annealing
SAMPLE PREPARATION:
0.3 l of amplified product was mixed with 2.7 l of dye (0.342 l Dextran blue,
1.5 l formamide, 0.478 l autoclave deionized water and 0.38 l TAMRA 300 or 500
internal lane size standard / reaction). Samples was denatured at 90C for 2 minutes
and placed on ice untilled loading. Samples were run on ABI 377 DNA sequencer for
one and a half hour. The data was collected by using ABI collection software. The
fragment sizes were estimated using Gene Scan software (v2.1). The allele were
and 2-3 gm of mix bed ion- exchange resin was added to the urea and mixed for 2-3
44
minutes. The solution was filtered through a Whatmann No. 1 filter paper into a 50 ml
graduated cylinder already containing 1.5 ml of 10X TBE (Trizma base; Tris
[hydroxymethyl] aminomethane 70g, 55g boric acid and 9.0g ethylene diamine tetra
acetic acid (EDTA, pH 8-8.2). The volume was made to 15 ml and filtered through a
0.2 M Millipore filter paper using a Millipore vacuum filtration assembly. To the filtered
solution 5 l of 10% ammonium per sulphate (APS) and 10.5 l TEMED was added
The rear and the front plate (12 cm) were washed with 1% Alconox detergent
first with de-mineralized water and then with deionized water. When plates were dry,
the rear plate was placed on the gel casting apparatus (Sequencing Gel Caster: model
SGC-1) with the inside of the plate facing up. Wet 0.2 mm spacers were placed on the
rear plate. The front plate was placed half way down on top of the rear plate. The 4 %
acryamide solution was filled in a 50ml syringe and poured slowly between the two
plates. The flat edge of a 0.2 mm comb was inserted in between the plates and plates
were sealed with clamps. The plate assembly was left for 30-45 minutes for the gel to
polymerize. The comb and clamps were removed. The plate assembly was washed
with demineralized water then deionized water and left for 15- 20 minutes. The shark
tooth side of the comb was inserted so that the teeth of the comb just touch the gel.
The plates were fixed on the gel cassette then on to the sequencer. The upper and
lower buffer reservoirs were attached. Plate check was carried out to ensure that the
gel plate was clean. 1X TBE buffer was filled in upper and lower buffer reservoirs.
Before loading the samples the gel was electrophoreses for 10 minutes.
45
Table IV: YSTR Primers sequences.
DYS19-L CTA CTG AGT TTC TGT TAT AGT TET 0.236
DYS390-R TGA CAG TAA AAT GAA CAC ATT GC FAM 0.127
DYS391-L-N CTA TTC ATT CAA TCA TAC ACC CAT AT FAM 0.384
DYS392-R-N CAG TCA AAG TGG AAA GTA GTC TGG HEX 0.155
DYS393-L GTG GTC TTC TAC TTG TGT CAA TAC 0.18
DYS393-R AAC TCA AGT CCA AAA AAT GAG G HEX 0.088
YSTR2
DYS389I-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032
DYS389II-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032
DYS425-R AGC TCT ACA AGC CAT TGT GAT CT FAM 0.861
cont.
46
Dye Final Conc.
YSTR2 Primer name Primer Sequence
label (M)
DYS426-L GGT GAC AAG ACG AGA CTT TGT G HEX 0.30
DYS 426-R CTC AAA GTA TGA AAG CAT GAC CA 0.25
YSTR3 DYS434-L CAC TCC CTG AGT GCT GGA TT TET 0.2
DYS438-L TGG GGA ATA GTT GAA CGG TAA HEX 0.2
DYS439-L TCC TGA ATG GTA CTT CCT AGG TTT TET 0.2
47
AUTOMATED FLUORESCENT DNA SEQUENCING:
using an ABI 377 DNA Sequencer and the dye terminator cycle sequencing ready
The reaction mixture contained: 1X PCR buffer II, 1.5mM MgCl2, 100 M dNTPs, 1U
DNA Taq polymerase, 1.0 M Primer (forward and reverse each) and 40ng DNA
template. The following PCR cycling conditions were used for the amplification: 1 cycle
the primer and describe in Appendix I), 1 minute at 720C; 1 cycle of 10 minute at 720C.
Amplified PCR products were first checked on 2% agarose gel. The amplified
product was precipitated with 50l of 95% ethanol. Sample was then washed with
200l of 70% ethanol and the pellets were resuspended in autoclaved deionised
water. If required, PCR products were also purified with the QIAquick PCR product
reaction was carried out in 10.0 l total reaction volume consisted of the following: 2.0
l sterile deionised H2O, 4.0 l Terminator ready reaction mix. (Includes labelled dye
terminators, buffer, and dNTPs), 1.0 l forward or reverse sequence specific primer
PCR was performed using a Thermo Hybaid multi-block system (MBS 0.2S), or
Thermo Hybaid PxE 0.2 thermal cycler for 25 cycles as follows: 10 seconds at 96oC, 5
After amplification, the products were precipitated with 50l of 95% ethanol,
washed with 200l of 70% ethanol and vacuum dried. The pellets were resuspended
in 5l of ABI loading buffer, diluted with formamide (1:5), samples was denatured at
95C for 2 minutes and placed on ice until loading. Samples were run on ABI 377
48
DNA sequencer for seven hour. The data was collected by using ABI collection
software.
10ml of deionised water, placed on a hot plate with constant stirring. After dissolving
the urea, 2.5ml of a 19:1 acrylamide gel solution (Sequa gel) and 2.5ml of 10X TBE
was added q.s. to 25ml with sterile deionised water. The solution was filtered through a
0.2m Millipore membrane filter and degassed using a Millipore vacuum filtration
assembly. To the filtered solution, 200l of 10% APS and 5l of TEMED was added
and immediately poured into the gel plates. The remaining procedure was same as
(DHPLC):
was initially developed by Oefner and Underhill (1995). This is a powerful technique in
products from a wild type DNA (control sample) and the test sample. The DNA
fragments are separated on a specialized DNA Sep column based upon the principle
of ion-pair reversed phase HPLC carried out under denaturing conditions. The
Transgenomic WAVETM DNA fragment analysis system was used for DHPLC work.
PCR was carried out in 15 l total reaction volume. The concentration of reagent for
PCR reaction is: 1X PCR Buffer, 1.5 mM MgCl2, 200 M dNTPs, 1U BioTaq DNA
polymerase, 1.0 M Primer (forward and reversed each), 40ng DNA template (20ng/ l).
taking 5 l of each PCR product. Equal volumes of the PCR products of a wild type
49
and each test sample were separately mixed and denatured at 95oC for 5 minutes.
They were then allowed to reanneal by decreasing the temperature at the rate of
Before setting up the experiment, the instrument was initially allowed to run
(purged) with 33% of buffer A (0.1M triethylamonium acetate (TEAA) solution, pH 7.0),
33% of buffer B (0.1M TEAA solution containing 25% acetonitrile, pH 7.0) and 34% of
buffer C (75% acetonitrile solution) for 2-5 minutes. After purging, the column was
equilibrated for 30 minutes with 50% of buffer A and 50% buffer B at a flow rate of
0.9ml/min. Five needle and injection port washes were carried out using buffer D (8%
acetonitrile).
The DNA sequence to be screened for polymorphisms was copied to the Wave
Maker (version 4.1) software and the appropriate temperature and gradient method for
that particular sequence was determined. A sample sheet specifying the tube
numbers, injection volumes, sample IDs and gradient was prepared. The system was
The optimal melting temperature for any DNA fragment can be determined by
(http://insertion.stanford.edu/melt.html).
50
RESULTS
-5-
SECTION 1
deletions) identify stable Y haplogroups and lineages. More than 600 such markers
representing 16 ethnic groups from Pakistan. The ethnic groups were categorized
broadly into two groups (Table I). The northern group was represented by unrelated
males from the Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and Punjabi ethnic
groups. Punjabis constitute the majority of Pakistans population and most reside in
the Punjab province adjoining India. The Punjabi samples analyzed were 185
unrelated male samples of the Gujar, Meo and Rajput castes. The southern group
samples were initially analyzed for four markers representing clades close to the root
of the Y phylogenetic tree. These included SRY10831.1 (clade B*), RPS4Y711 (clade
C*), YAP (clade E*) and M89 (clade F*). The frequencies of these B*, C*, E* and F*
haplogroups in different ethnic groups of Pakistan. Among these four (B*, C*, E* and
F*) haplogroups, F* was the most frequent in both northern and southern populations
(Figure III). As expected, the majority (85%) of Y chromosomes from Pakistan were
derived from M89. The M89 derived alleles are frequently found in most world
populations residing outside Africa, and represents YCC clades F through T (Figure
51
frequencies in the different ethnic groups of Pakistan (Table VI). The thirty-three
M60 haplogroup was observed in 0.9 % of the Brahui and 3% of the Makrani-Negroid
(60%). It was also present in the Brahui, Mohanna, Burusho, Meo and Gujar with a
frequency that ranges from 1.6 to 8.2% (Tables VI and VII). Individuals carrying the
derived allele for RPS4Y711 marker were further sub-typed for five additional markers
that identify clades C1, C2 and C3. These included the markers M8 (C1*), M38
(C2*), M217, PK2 (C3*), and M48 (C3a). Of these, only PK2 was detected. The PK2
haplogroup C3* (Mohyuddin et al., 2006). All of the Hazara (60%) and Burusho
(8.2%) RPS4Y711 derived Y chromosomes also had the derived allele for the PK2
marker.
clade DE* were observed mainly in the southern populations. Except for the
Mohanna, this haplogroup was observed in all southern populations with frequency
between 1.5%- 10.6%. The Pathans were the only northern population in which
these chromosomes were observed (2.1%). Several off-shoots of DE* clade were
haplogroup E* and carried the derived allele for SRY-8299. Further sub-typing of clade
Negroid (9.1%). These chromosomes were also found in the Makrani-Baloch (3.7%),
Brahui (3.4%) and in Baloch (1.5%). The remaining YAP+ chromosomes carried the
52
Table V: Frequency of haplogroups B*, C*, E* and F* in ethnic groups
from Pakistan.
Population n B* C* E* F*
Northern
Balti 14 - - - 1.000
Kalash 44 - - - 1.000
Kashmiri 12 - - - 1.000
Southern
53
Figure III: Distribution of haplogroups B*, C*, E* and F* in populations from
54
Figure IV: Y haplogroups frequency distribution in ethnic groups of Pakistan.
55
derived allele for E1b1b1*-M35 haplogroup. This clade comprises six main branches
which have a wide distribution in Africa, Asia and Europe. Of these, the E1b1b1a*-
interesting that only two YAP+ populations i.e., Baloch from southern group and
6.1% and 2.1%, respectively. The majority of the southern populations carry the
derived allele for the M123 marker. The frequency of E1b1b1c*-M123 haplogroup
was 5.6% in the Parsi, 3.7% in the Makrani-Baloch, 2.2% in the Sindhi and 1.5% in
the Baloch.The derived allele for M89 was observed at very high frequency in
representatives from all population groups of Pakistan except for the Hazara. The
was highest in the Kalash from northern Pakistan. Haplogroup G* was also observed
in all southern populations except for the Baloch and Makrani-Baloch tribes. Low
One major sub-clade of this haplogroup G2a*, which is derived for the P15
populations. Among the northern group only Kalash and Pathan Y-chromosome
exhibits a frequency of 4% in Pakistan. The highest frequency was found in the Balti
(7.1%), Kalash (20.4%), Punjabi (7.6%), Makrani Negroid (6.1%) and Sindhi (5.8%)
samples (Table VI and Figure V). Individuals carrying the derived allele for H1* clade
were further sub-typed for two markers that identify clade H1a1-M36 and H1a2-M97.
56
Haplogroup I*-M170, A-C mutation on the Y chromosome is thought to have
<0.1% as it was only observed in one individual belonging to the Hazara population.
Clade J*, characterized by the 12f2a deletion, was widely distributed across
Pakistan. The majority of these Y chromosomes were represented by the J2a2* (M-
J2a2* haplogroup was found in all ethnic groups examined and constituted 10% 0f
the population (Figure V). One offshoot of the J2a2* haplogroup, the J2a2a*
haplogroup characterized by the derived allele for the biallelic marker M92, was
observed in one southern population the Brahui (8.5%). The other main branch of
the J lineage, J1*-M267, was also observed in this population in addition to the
Baloch, Makrani-Baloch and Sindhi from southern Pakistan. The Pathan was the
only northern group that carried the J1* haplogroup, albeit at very low frequency
(1.0%).
marker and fall in clades K*-T*. The derived allele for M9 is widespread in Pakistan
and accounts for 61% of all Y-chromosomes, all of which were resolved into sub-
clades L*, NO*, Q*, R* and T*. Lineages K1-K4, that are a component of the Asian
Pakistani population with frequency ranging from 1.1%-24.2%. Of the three well
Pakistan is L1 that has the derived allele for M27. L1 occurs at an average
24.2% in the Baloch and 1.4% in the Parsi. Among the northern populations this
haplogroup is observed only in the Pathan and Punjabi (Tables VI, VII and Figure V).
57
The L2*-M317 haplogroup, another offshoot of L* was observed in only two southern
populations the Parsis and Makrani- Baloch at frequencies of 13.3% and 3.7%,
respectively. The remaining branch L3* had a more widespread distribution and the
highest frequency was observed in the northern Burusho and Balti populations (Table
VI and Figure VI). L3a, a branch of L3*, characterized by the marker PK3 appears
An extremely low frequency of the NO* clade was observed in Pakistan. The
northern (Burusho and Pathan) and two southern (Brahui and Mohanna) populations
only. The N1* (LLY22g derived) Y chromosomes were present in a Brahui and
Mohanna individual. The newly discovered haplogroup O2a1a-PK4 was found only
in the Pathan (4.2%) but the East Asian O3* M122 derived haplogroup was observed
in the Brahui (<1%), Burusho (3.1%) and Pathan (1%) samples. LY1 derived
Two major Y haplogroups Q*-M242 and R*-M207 branch off clade P* that is
delineated by numerous SNPs including 92R7, M45 and M74. All P* chromosomes
Pathan and Punjabi) and four southern (Baloch, Brahui, Makrani-Baloch and Sindhi)
populations.
in Pakistan. It has two major branches R1* (M173 derived) and R2 (M124 derived)
Europe, West and Central Asia occurs at an average frequency of 4.8% and is
observed in all the Pakistani populations (Table VI and Figure VI). One derivative of
present in all population included in this study (Table VI and Figure VI). The highest
58
frequency of R1a1* was observed in the Mohanna (71.4%) and lowest in the Parsi
(7.8%). Other populations with appreciable (>50%) frequency of R1a1* included the
Kashmiri (58.3%), Punjabi caste (56.7%), and Sindhi (51.4%). On the background of
Haplogroup R2 that has the M124 derived allele occurs in many Pakistani
populations and has an average frequency of 5.8%. Except for the Mohanna it is
observed in all southern populations. Its distribution is patchy in the north of Pakistan
and it is found only in the Burusho, Kashmiri and Punjabi populations (Figure VI).
derived allele for M70 and was only found in a single Pathan individual.
59
Table VI: Number and frequencies of populations fall in haplogroup B-I.
No. Haplogroups
(SRY-8299)
(RPS4Y711)
(sY81=M2)
E1b1b1a
E1b1b1c
(M123)
(M201)
(M170)
E1b1a
(M60)
(M78)
(M89)
(M52)
(PK2)
(P15)
G2a
Population
C3
H1
G*
C
B
I
n
North
Balti 14 6 0 0 0 0 0 0 0 0 0 0 1(7.1) 0
Burusho 97 15 0 0 8(8.2) 0 0 0 0 1(1.0) 1(1.0) 0 4(4.1) 0
Hazara 224 9 0 0 134(60) 0 0 0 0 13(5.8) 0 0 0 1(0.5)
Kalash 44 8 0 0 0 0 0 0 0 0 0 8(18.1) 9(20.4) 0
Kashmiri 12 5 0 0 0 0 0 0 0 0 0 0 0 0
Pathan 96 16 0 0 0 0 0 2(2.1) 0 2(2.1) 10(10.4) 1(1.0) 4(4.2) 0
Punjabi 185 14 0 3 (1.6) 0 0 0 0 0 7(4.0) 1(0.54) 0 14(7.6) 0
South
Baloch 66 13 0 0 0 1(1.5) 1(1.5) 4(6.1) 1(1.5) 1(1.51) 0 0 0 0
Brahui 117 18 1(0.9) 2 (2.0) 0 0 4(3.4) 0 0 0 0 9(8.0) 1(1.0) 0
Makrani-B 27 11 0 0 0 0 1(3.7) 0 1(3.7) 0 0 0 0 0
Makrani-N 33 11 1(3.0) 0 0 1(3.0) 3(9.1) 0 0 0 0 1(3.0) 2(6.1) 0
Mohanna 70 9 0 3 (4.3) 0 0 0 0 0 0 1(1.4) 3(4.3) 2(2.9) 0
Parsi 90 11 0 0 0 0 0 0 5(5.6) 0 0 1(1.1) 2(2.2) 0
Sindhi 138 13 0 0 0 0 0 0 3(2.2) 2(1.5) 0 2(1.5) 8(5.8) 0
Total 1213 2 8 142 2 9 6 10 26 13 25 47 1
33 (0.2) (0.7) (11.7) (0.2) (0.7) (0.5) (0.8) (2.1) (1.1) (2.1) (4.0) (0.08)
%
Cont.
60
Table VI: Number and frequencies of populations fall in haplogroup J-L.
Population
(12f2a)
(M267)
(M172)
(M317)
(M357)
J2a2a
(M67)
(M92)
(M20)
(M27)
(PK3)
J2a2
L3a
L1
L2
L3
J1
J2
L
J
n
North
Balti 14 0 0 0 2(14.3) 0 0 0 0 2(14.3) 0
Burusho 97 0 0 1(1.0) 7(7.2) 0 3(3.1) 0 0 14(14.4) 0
Hazara 224 21(9.4) 0 3(1.4) 1(0.5) 0 0 0 0 0 0
Kalash 44 0 0 0 4(9.1) 0 1(2.3) 0 0 0 10(23.0)
Kashmiri 12 1(8.3) 0 0 1(8.3) 0 0 0 0 0 0
Pathan 96 0 1(1.0) 0 5(5.2) 0 0 5(5.2) 0 7(7.3) 0
Punjabi 185 1(0.54) 0 0 18(9.7) 0 2(1.1) 15(8.2) 0 4(2.2) 0
South
Baloch 66 0 2(3.0) 0 6(9.1) 0 0 16(24.2) 0 3(4.5) 0
Brahui 117 5(4.3) 6(5.1) 0 10(8.5) 10(8.5) 0 7(6.0) 0 2(1.7) 0
Makrani-B 27 0 1(3.7) 0 5(18.5) 0 1(3.7) 2(7.4) 1(3.7) 0 0
Makrani-N 33 0 0 0 6(18.1) 0 0 2(6.1) 0 1(3.0) 0
Mohanna 70 0 0 0 3(4.3) 0 1(1.4) 6(8.6) 0 0 0
Parsi 90 0 0 0 35(38.9) 0 3(3.3) 1(1.4) 12(13.3) 0 0
Sindhi 138 2(1.45) 4(3.0) 0 19(14.0) 0 0 6(4.4) 0 4(3.0) 0
Total 30 14 4 122 10 11 60 13 37 10
1213 (2.5) (1.2) (0.3) (10.1) (0.8) (0.9) (5.0) (1.1) (3.0) (0.8)
%
Cont.
61
Table VI: Number and frequencies of populations fall in haplogroup N-T.
(LLY22g)
(M122)
(M242)
(M207)
(M173)
(M124)
O2a1a
O3a3a
R1a1e
Population
(M17)
(M70)
(PK4)
(PK5)
(L1Y)
R1a1
O3
N1
R1
R2
Q
T
N
North
Balti 14 0 0 0 0 0 2(14.3) 1(7.1) 6(43.0) 0 0 0
Burusho 97 0 0 3(3.1) 0 2(2.1) 11(11.3) 1(1.0) 25(25.8) 2(2.1) 14(14.3) 0
Hazara 224 0 0 0 0 4(2.0) 0 26(11.6) 21(9.4) 0 0 0
Kalash 44 0 0 0 0 0 3(7.0) 1(2.3) 8(18.1) 0 0 0
Kashmiri 12 0 0 0 0 0 0 2(16.6) 7(58.3) 0 1(8.3) 0
Pathan 96 0 4(4.2) 1(1.0) 0 5(5.2) 1(1.0) 4(4.2) 43(44.8) 0 0 1(1.0)
Punjabi 185 0 0 0 0 1(0.55) 2(1.1) 4(2.1) 105(56.7) 0 8(4.3) 0
South
Baloch 66 0 0 0 0 2(3.1) 0 4(6.1) 19(28.8) 0 6(9.1) 0
Brahui 117 1(0.8) 0 1(0.8) 1(1.0) 1(1.0) 0 3(2.6) 45(38.4) 0 8(7.0) 0
Makrani-B 27 0 0 0 0 1(3.7) 0 1(3.7) 9(33.3) 0 4(15) 0
Makrani-N 33 0 0 0 0 0 0 4(12.1) 10(30.3) 0 2(6.1) 0
Mohanna 70 1(1.43) 0 0 0 0 0 0 50(71.4) 0 0 0
Parsi 90 0 0 0 0 0 1(1.1) 4(4.4) 7(7.8) 0 19(21.1) 0
Sindhi 138 0 0 0 0 6(4.3) 0 3(2.2) 71(51.4) 0 8(6.0) 0
Total 2 4 5 1 22 20 58 426 2 70 1
1213 (0.2) (0.3) (0.4) (0.1) (1.8) (1.6) (4.8) (35.1) (0.2) (5.8) (0.1)
%
62
Table VII: Y lineages found in the three Punjabi castes examined in this study.
No. haplogroups
(RPS4Y711)
(12f2a)
(M201)
(M357)
(M242)
(M207)
(M173)
(M124)
R1a1*
(M89)
(M52)
(M67)
(M20)
(M27)
(M17)
J2a2*
H1*
R1*
Populations
L3*
C*
R2
G*
Q*
L1
R*
F*
L*
J*
n
Gujar 159 13 2 6 1 14 1 17 2 15 4 0 1 3 86 7
(1.3) (3.8) (0.6) (8.8) (0.6) (10.6) (1.3) (9.4) (2.5) - (0.6) (1.3) (55) (4.4)
Meo 16 4 1 0 0 0 0 1 0 0 0 0 0 1 13 0
(6.2) - - - - (6.3) - - - - - (6.25) (81) -
Rajput 10 5 0 1 0 0 0 0 0 0 0 1 1 0 6 1
- (10) - - - - - - - (10) (10) - (60) (10)
63
Figure V: Distribution of major Y lineages (PK2, M52, M67 and M27) frequencies
64
Figure VI: Distribution of major Y lineages (M357, M173, M17 and M124)
65
PHYLOGENETIC ANALYSES
population relationships. This analysis is based upon the frequencies of thirty three
Y haplogroups in Pakistani ethnic groups. The principal component, PC1 and PC2,
account for 72% of the variation in the population (Figure VII). The PC analysis
shows that the all Pakistani populations group together, with the exception of the
Hazara, who are relatively distinct from other Pakistani ethnic groups and are
clustered in the lower right quadrant of the graph. Interestingly, other populations
such as, Brahui and Balti which are linguistically different from others; and the
Kalash, that are isolated; did not stand out and grouped with other ethnic group from
Pakistan.
PHYLOGENETIC ANALYSIS:
Analysis of Molecular Variance (AMOVA) was carried out using the Arlequin
software. The populations were grouped on the basis of ethnicity, geographic origin
and the linguistic affiliation. On the basis of this analysis we ascribed that ethnically
the population were significantly different from each other (p value Va vs FCT:
The pair-wise FST values between Pakistani ethnic groups based on the
significance; based upon 110 permutations among the Pakistani populations with
significance level of 0.05; also demonstrated that significant variation occurs among
66
Figure VII: Principal component analysis based on Y haplogroup frequencies
in Pakistani populations.
Balti: Blt, Burusho: Bsk, Hazara: Hzr, Kalash: Kal, Kashmiri: Ksr, Pathan: Pkh,
Gujar: Gjr, Meo: Meo, Rajput: Rpt, Baloch: Ball, Brahui: Bru, Makrani-Baloch:
67
Table VIII: Percentage of variation obtained by AMOVA at three levels of population hierarchy in ethnic groups from Pakistan.
Basis for Number Percentage of variation Variance components Fixation Indices p value
grouping of Among Among Within Va Vb FCT FSC FST Va vs FCT
groups groups populations populations (1023 permutations)
within
groups
None 1 - 15.22 84.78 0.0649 0.3617 - - 0.1522 -
Ethnicity 13 14.45 0.90 84.65 0.0618 0.0038 0.0105 0.1445 0.1535 0.0205 0.0050
Geographic 2 1.12 14.52 84.36 0.0048 0.0623 0.0112 0.1469 0.1564 0.4076 0.0167
Linguistic 4 - 8.99 19.34 89.65 - 0.0363 0.0780 - 0.0899 0.1774 0.1035 0.9746 0.0047
68
Table IX: Population pair wise FSTs between Pakistani ethnic groups computed from Y haplogroup frequencies.
FST p values (based upon 110 permutations) are given above the diagonal with * indicating significant pair wise
differences.
Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT
Baloch (BAL) - 0.0000* 0.3153 0.0630 0.0000* 0.0000* 0.0000* 0.1081 0.0000* 0.0000* 0.0000* 0.0360* 0.0000* 0.0000* 0.0000* 0.0360*
Brahui (BRU) 0.0275 - 0.3063 0.1982 0.0000* 0.0000* 0.0180* 0.3243 0.0000* 0.0000* 0.0000* 0.2882 0.0090* 0.0000* 0.0000* 0.1801
0.1982
Makrani Baloch (MAKB) 0.0053 0.0016 - 0.8018 0.0000* 0.0090 0.0720 0.3423 0.0991 0.0000* 0.0000* 0.3063 0.05405* 0.0000* 0.0180*
0.0810
Makrani Negroid (MAKN) 0.0146 0.0088 -0.0146 - 0.0000* 0.0000* 0.0270* 0.5495 0.0090* 0.0000* 0.0000* 0.3063 0.0180* 0.0090 0.0000*
0.2973
Mohanna (MHN) 0.1405 0.0774 0.1280 0.1392 - 0.0000* 0.0000* 0.0180* 0.0000* 0.0000* 0.0000* 0.1711 0.0000* 0.5225 0.0000*
0.0000*
Parsi (PRS) 0.1148 0.1268 0.0539 0.0728 0.3099 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.5855
Sindhi (SDH) 0.0549 0.0172 0.0143 0.0284 0.0376 0.1647 - 0.4234 0.0000* 0.0000* 0.0000* 0.6486 0.0270* 0.0720 0.3783
0.5585
Balti (BLT) 0.0339 0.0058 0.0019 -0.0087 0.0899 0.1261 -0.0026 - 0.4324 0.0000* 0.0000* 0.5225 0.4774 0.0810 0.1891
0.0720
Burusho (BSK) 0.0458 0.0389 0.0188 0.0273 0.1585 0.0991 0.0629 -0.0000 - 0.0000* 0.0000* 0.0270* 0.0000* 0.0000* 0.0000*
0.0000*
Hazara (HZR) 0.2653 0.2603 0.2721 0.2580 0.3997 0.3058 0.3072 0.2882 0.2109 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.0090
Kalash (KAL) 0.1002 0.0797 0.0799 0.0586 0.2338 0.1374 0.1224 0.0650 0.0759 0.2818 - 0.0000* 0.0000* 0.0000* 0.0000*
0.8918
Kashmiri (KSR) 0.0535 0.0052 0.0117 0.0149 0.0224 0.1798 -0.0144 -0.0124 0.0591 0.3150 0.1299 - 0.3513 0.3243 0.4864
0.3693
Pathan (PKH) 0.0418 0.0193 0.0264 0.0272 0.0580 0.1721 0.0129 -0.0075 0.0467 0.2812 0.1024 0.0023 - 0.0000* 0.0090
0.3693
Meo (MEO) 0.1653 0.0943 0.1408 0.1459 -0.0113 0.3160 0.0470 0.1031 0.1675 0.4194 0.2485 0.0112 0.0720 - 0.0630
0.4864
Gujjar (GJR) 0.0582 0.0279 0.0329 0.0416 0.0255 0.1941 0.0002 0.0062 0.0772 0.3193 0.1354 -0.0074 0.0164 0.0415 -
-
Rajput (RPT) 0.0590 0.0115 0.0216 0.0464 0.0096 0.2071 -0.0135 -0.0106 0.0429 0.3293 0.1292 -0.0389 -0.0047 0.0216 -0.0099
69
Table X: Matrix of significant. FST p values (significance level =0.0500) based upon 110 permutations among the
ethnic group of Pakistan.
Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT
70
MEDIAN-JOINING NETWORK:
lineage network (Figure VIII). The L lineage is considered to arise in Indus valley
region during the Indus valley civilization. The network revealed four clusters,
haplogroup, samples carrying the L2*-M317 haplgroup were encircled in green and
L3a-PK3 samples were encircled in yellow. The remaining samples carry L3*-M357
Pakistani populations; conversely this net work shows a high degree of population-
upper right end containing 15 of 16 Parsis. The Kalash fall into two clusters and
Burusho make a cluster at the middle of the net work. Haplotype sharing is the other
striking feature of this network. Within a specific population, for example, the
Burusho, Kalash and Parsi share some haplotypes. However, the four Baloch
individuals shared their haplotype with Sindhi and Makrani-Baloch individuals from
nearby southern population. Similarly, one haplotype was shared between a Brahui
71
Figure VIII: Median-joining network of Lineage L individuals based on YSTR
haplotypes.
72
SECTION 2
POPULATIONS:
Current study also included three ethnic groups from northern Pakistan ___
the Burusho, Kalash and Pathan ___ that claim Greek ancestry. These populations
were compared with extant Greek samples from Europe that were genotyped for the
Greeks, Burusho, Kalash, Pathan and the rest of the Pakistani populations are
POPULATIONS:
Only eight Y haplogroups were found in the Kalash population. More than 75% of
these samples were represented by haplogroups which are frequent in West Asia,
A comparison of the three Pakistani ethnic groups with the Greek populations
shows that certain haplogroups are shared between these populations. These
include clades E*, F*, I*, J*, R1* and T*. Majority of the Pakistani and Greek Y
chromosomes have the derived allele for the M207 marker that encompasses
branches R1* and R1a1* of the Y chromosome phylogenetic tree (Figure IX). R1a1*
was the most common haplogroup found in Pakistan (35.9%) and Greece (15.6%).
Compared to the Greek the frequency of haplogroup R1a1* was relatively higher in
the Pathan (44.8%), Burusho (25.8%) and Kalash (18.2%) samples. Clade R1*
represented by the derived allele for SNP M173 was observed in 11.7% of the Greek
73
and 5.32% of the Pakistani samples. The Greek population exhibited a higher
frequency of this clade in comparison with the Burusho (1.03%), Kalash (2.27%) and
Pathan (4.2%).
Haplogroup J* was the other haplogroup that was found at a high frequency
in the Greek (17%) and Pakistani (14.8%) samples. The overwhelming majority of
comparable frequency in Pakistan. This haplogroup J2* (including all its derivatives)
was present at a frequency of 15.6% in the Greek, 8.2% in the Burusho, 9.09% in the
belonged to haplogroup J2a2*, being derived for the marker M67. The Greek
samples could not be typed for this SNP due to lack of DNA. The J1* haplogroup
characterized by the derived allele for M267 was absent in the Burusho and Kalash
populations and was found at low (1%) frequency in the Greek and Pathan.
E1b1b1* (M35 derived) and all Greek and Pakistani samples were resolved into the
branches E1b1b1a* (M78 derived) and E1b1b1c* (M123 derived). Among the three
Pakistani populations claiming Greek descent the M78 derived Y chromosomes were
observed only in the Pathan (2%). This branch constituted 16.9% of the Greek
samples. Clade E1b1b1c* was present at a frequency of only 2.6% in the Greek and
was absent in the Burusho, Kalash, Pathan populations. Its frequency in the
G2a* haplogroup characterized by the T allele for SNP P15 (Hammer et al., 2000).
This haplogroup was observed in 18.18% of Kalash and 1% of the Pathan samples
Two branches that frequently characterize Y chromosomes found outside Africa are
75
(Rootsi et al., 2004; Underhill et al., 2001). One Greek sample belonged to
1998). These Y chromosomes are not found in Pakistan but have been observed in
neighboring India and this is the first time they have been observed in Greece.
to Europe and was observed in 19.5% of the Greek sample. This haplogroup was
not observed in the Burusho, Kalash or Pathan and its frequency in Pakistan was <
0.2%.
were represented by 2% of the Pathan and 1% of the Greek and Burusho samples. It
is possible that in this case distinct haplogroups, as yet unknown, are being classified
based upon Y haplogroup frequencies in the Greek and Pakistani ethnic groups was
carried out (Figure X). The first two principal components, PC1 and PC2, account for
79% haplogroup frequency data and separate the populations according to their
geographic locations. The plot shows the Pathan and Burusho populations clustering
with the remaining Pakistani populations in the upper right quadrant of the graph.
The Kalash and Greek form two separate and distinct clusters. To ensure that the
Greek individuals included in this study were representative of the Greek population
studied earlier, results of comparable biallelic data (Francalacci et al., 2003) were
incorporated in the principal component analysis (Figure XI). The Greek population
included in this study clustered with the Greek populations studied earlier but the
76
Figure X: A plot of the first two principal coordinates based upon the analysis
77
Figure XI: A plot of the first two principal coordinates based upon the analysis
78
GENETIC DISTANCES AND PHYLOGENETIC ANALYSIS:
measures that are more sensitive to recent events (Table XI). The PakistaniGreek
population pair wise FST values based on the variation of STRs within haplogroups
(Qamar et al., 2002) ranged from 0.131 to 0.213, with the lowest value between the
Pathan and the Greeks. Pairwise genetic distances (the number of steps between
a haplotype in one population and the closest haplotype in the second population,
averaged over all comparisons) (Bandelt et al., 1999) ranged from 4.3 to 8.1, with the
replicates) also demonstrated that of the three Pakistani populations, the Pathans
Therefore, together these results, suggest that there might have been a low
NETWORK software using Y-STR frequencies was carried out to investigate this
possibility further.
79
Table XI: Weighted population pair wise genetic distances (below diagonal)
and FST values (above diagonal) based on STR variation within haplogroups.
80
Figure XII: Neighbor-joining tree showing the relationship between the Greek
and three Pakistani ethnic groups. The tree is based on genetic distances.
81
MEDIAN-JOINING NETWORK:
constructed in order to examine the genetic relationship between the Greek and
Pathan samples. A duplication of 10 and 13 repeat units was observed in the clade-
E derived Y chromosomes for the tri-nucleotide repeat DYS425 and this locus was
subsequently excluded from the network. The most striking feature of this network
was the sharing of haplotypes between the Pathan and Greek samples (Figure XIII).
One Pathan individual shared the same Y-STR haplotype with three Greek
individuals, and the other Pathan sample was separated from this cluster by a single
mutation at the DYS436 locus. This demonstrates a very close relationship between
82
Figure XIII: Median-joining network of clade E* lineages in Pakistan (open
circles) and Greece (hatched circles). Circles represent haplotypes and have
83
CONTOUR MAPPING:
the Greek and Pathan clade E1b1b1a* individuals was checked in the Y-STR
Haplotype Reference Database (YHRD; Roewer et al., 2001). Worldwide data for
DYS438, DYS436, DYS439 were not available in this database. However, part of
this haplotype based upon a subset of nine Y-STRs (DYS19=15; 389I=13; 389II=29;
worldwide population sample of 7,897 haplotypes. This haplotype was highly specific
for the Balkans. The contour map of this haplotype (Figure XIV) shows a major
concentration in the Balkans, around Macedonia and Greece, with a low scattering in
other European countries and a comparable frequency in Tunisia and West Africa
and the Pathan. This gives a strong indication of an European, possibly Greek,
84
Figure XIV: Contour map showing the 9 Y-STR haplotype frequency
85
DISCUSSION
-6-
Our DNA is inherited from our ancestors, so genetic analysis can be used to
this respect because most of it is passed down from father to son without change,
Pakistan lies on the postulated southern coastal route out of Africa. The
earliest evidence suggests this region was colonized about 60,000-70,000 years ago.
Pakistan was the site of several ancient cultures such as Mehrgarh, one of the
Baluchistan (Jarrige, 1991) and evidence from this region indicates that modern
humans were settled in this region during the Neolithic period. The region's other
earliest civilizations were the Indus Valley civilization at Harappa and Mohenjo-Daro.
racial groups due to the invasion of the region through out the millennia. Thus, it is
Present day Pakistan is bordered by Iran and Afghanistan on the west, India
towards the east and China in the north. The Indian Ocean straddles its entire
lineage that originated from a single male ancestor somewhere in the world in the
past. The discovery of new paragroups and the formerly discovered lineages have
1. The genetic diversity within Pakistani ethnic groups from the male
perspective.
populations claim that they are the descendent from the Greek
soldiers which were left behind in this region by Alexander the Great.
87
PART 1
characterized into two categories; the northern group that incorporated the Punjabi
populations and a southern group. The northern populations that were screened
included Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and the Punjabis (Gujar,
Meo and Rajput) castes. The populations from the south of Pakistan included
frequent in South Asia, Europe and the Mediterranean region, together make up 60%
of the Pakistani populations. It was also observed that the southern population group
(45%) of southern populations carry these 33 Y haplogroups, whereas they are found
in 39% and 15% of northern and Punjabi populations respectively. In this study, we
also screened 1,213 Pakistani individuals for five novel Y-SNPs PK1-PK5
(Mohyuddin et al., 2006). Three SNPs identify population specific haplogroups within
Pakistan. L3a-PK3 was found solely in the Kalash population, the O2a1a-PK4 was
it is observed that all the ethnic groups from Pakistan cluster together except the
culturally and the linguistically isolated ethnic groups such as Kalash, Burusho and
the Dravidian speaking Brahui, however, they do not stand out in the over all
comparison.
88
Haplogroup C*-chromosome and its off-shoot separate the northern and
southern region within Pakistan. C*-RPS4Y haplogroup was only found in two
southern populations the Mohanna (4.3%) and Brahui (2%). Interestingly, the
Punjabis from the northern part carry this haplogroup (1.6%) as well (Table VI).
was found only in two northern ethnic groups (Table VI). This haplogroup was
highest among the Hazara (60%) followed by the Burusho (8.2%). The C*-RPS4Y
haplogroup is fairly common in Central Asia and Mongolia and it points towards the
1979) and genetically (Qamar et al., 2002; Zerjal et al., 2003). However, the origin of
Burusho is not well documented. Some claim that they are the descendants of
Greek soldiers while some others claim that they are descendants of Dards from
Central Asia (Biddulph, 1977). The analysis of Francalacci and Rootsi shows that
Rootsi et al., 2004). On the other hand, one of the earlier studies shows that the
2001). Furthermore, the studies with the autosomal genetic markers (Ayub et al.,
2003; Mansoor et al., 2004) and markers of Y chromosome (Firasat et al., 2007)
suggest that the Burusho are genetically close to their geographic neighbors. The
Asia suggest that the C*-chromosome arose in Central Asia before the separation of
were also detected with higher frequency in the southern group of Pakistan as
compared to northern and the Punjabi group. Haplogroup E*-SRY-8299 has been
reported to have a North African origin and is not found in northern Pakistani ethnic
groups and the Punjabi group (Qamar et al.,1999). However, a low frequency of this
haplogroup is found in the southern group of Pakistan (0. 2%). The haplogroup of
89
E1b1a*-sY81 (M2) is sub-Saharan in origin and is found in Baloch, Brahui, Makrani-
Baloch and Makrani-Negroid (1.5%, 3.4%, 3.7 and 9.1% respectively) populations of
the south (Table VI). The highest frequency of haplogroup E1b1a*-Sy81 is found in
the Makrani-Negroid population (9.1%) who are reported to have a recent African
the genetic legacy of the African slaves that were brought to the Indo-Pakistan
Africa (Semino et al., 2004). The remaining E1b1b1* Pakistani Y chromosomes were
haplogroup was present only in Pathan (2.1%) from northern site and Baloch (6.1%)
from southern site of Pakistan (Table VI). All the E1b1b1*-M35 chromosomes from
Iran (Regueiro et al., 2006), Turkey (Cinnioglu et al., 2004) and in Greece (Firasat et
al., 2007). It is also possible that the clade E haplogroup expands with the spread of
(10.4%). Towards the south the frequency of G*-M201 dramatically decreased and
only 1.4% Mohanna carry this haplogroup (Table VI). Haplogroup G*-M201 occurs at
~ 30% in Georgia (Semino et al., 2000) and the north Caucasus (Nasidze et al.,
2003), 10.9% in Turkey (Cinnioglu et al., 2004), 2.2% in Iraq (Al-Zharery et al., 2003)
and 1.33% in Iran (Regueiro et al., 2006). This haplogroup is also found in southeast
Europe and in the Mediterranean regions (Semino et al., 2000). In contrast to the
haplogroup in Southern group of Pakistan. Except the Baloch and the Makrani-
Baloch this haplogroup is found in all other ethnic groups belonging to southern
90
Pakistan. However, from northern Pakistan only Kalash and Pathan carry this
5% in Italy and Greece (DiGiacomo et al., 2003) and 7.33% in Iran and throughout
the Middle East with a maximum of 19 % in the Druze (Hammer et al., 2000).
followed by the Gujar (7.6%), Balti (7.1%), Makrani-Negroid (6.1%), Sindhi (5.8%)
etc. (Tables VI and VII). Many studies have showed that the clade H originated
within the Indo-Pak subcontinent (Gayden et al., 2007; Kivisild et al., 2003; Pandya et
al., 1998; Sengupta et al., 2006). The frequency of this indigenous haplogroup was
found higher in southern India (Ramana et al., 2001; Wells et al., 2001) as compared
to the northwest Punjab (Kivisild et al., 2003). Other than India and Pakistan this
haplogroup was found in Newar (6.1%), Kathmandu (11.7%) (Gayden et al., 2007)
and in Turkey (0.38%) (Cinnioglu et al., 2004). The other branch of Clade H*, H2*-
APT, is also found with higher frequency in India but none of the Pakistani Y-
is widely distributed in Eurasia, Middle East, and in North Africa (Hammer et al.,
all Pakistani populations. The low frequency of J1*-M267 was detected in Pakistani
populations. This haplogroup characterized African and Arabian populations and the
frequency of J1*-M267 chromosome decreases towards the north and east direction.
The high frequencies of this haplogroup were found in Oman (38%) (Luis et al.,
2004); Iraq (33%) (Al-Zahery et al., 2003); Egypt (20%) (Luis et al., 2004); Lebanon
(13%) (Semino et al., 2000); Turkey (9%) (Cinnioglu et al., 2004); Iran (10.5%)
(Regueiro et al., 2006); India (0.27%) and East Asia (0%) (Sengupta et al., 2006);
91
and in Pakistan (1.2%). The frequencies of this haplogroup indicate the differential
influence from East Africa and Middle East in southwestern Asia. However, the other
clade of J* haplogroup the J2* haplogroup are distributed mainly in west Asians and
during the dispersal of Neolithic farmers (King and Underhill, 2002). Haplogroup J2*
and its derivative were found at a frequency of 23% in Iran (Regueiro et al., 2006),
22.2% in Turkey (Cinnioglu et al., 2004), 9% in India (Sengupta et al., 2006) and
haplogroup as one moves from the south west to the north east of Pakistan. A
decrease in the frequency of J2* derivatives can be seen east of Iranian Plateau in
South Pakistan (7.7%), with a dramatic decline in north Pakistan (2.0%) and in
Punjabi caste (1.5%) (Table VI). Sengupta et al., (2006) shows that J2* clade is
nearly absent in East Asia (1.14%). The presence of J2* and its derivative
gene flow and is supported by the high frequency of this haplogroup in the Parsis.
al., 1997). The L* haplogroup could be a recent event and arose in Indus valley
region during the Indus valley civilization. This high frequency of L* haplogroup is
south Caucasus populations (Weale et al., 2001), Middle East (Nebel et al., 2001b),
Pakistan (Qamar et al., 2002), India (Kivisild et al., 2003; Sengupta et al., 2006).
However one of its sub branches L1-M27 was found with high frequency in Pakistan
(5%), India (6.32%) (Sengupta et al., 2006) and Iran (2.6%) (Regueiro et al., 2006)
while no L1-M27 chromosome was observed in East Asia (Sengupta et al., 2006)
and in Turkey (Cinnioglu et al., 2004). Comparison among the three Pakistani
92
haplogroup distribution. A considerable diversity was noticed in populations
VI). This haplogroup is widespread in Europe, the Caucasus, West Asia, Central Asia
and in South Asia (Sengupta et al., 2006) however, it is absent in Africa and the New
Russia/Ukraine in the region between the Black and Caspian Seas. This R1a1*
chromosome spread with the expansion of Kurgan culture (Passarino et al., 2001;
Quintana-Murci et al., 2001; Wells et al., 2001; Sengupta et al., 2006). Recent
studies showed that this chromosome covers the area ranging from India to Norway
(Kivisild et al. 2003; Passarino et al., 2002; Quintana-Murci et al., 2001) but it is
coincided with the arrival of Indo-European nomadic pastoral tribes from West and
Central Asia (Quintana-Murci et al., 2001). However, the study by Sengupta et al.
(2006) revealed the Holocene expansion of this R1a1*-M17 chromosome before the
93
PART 2
Burusho, Kalash and Pathan who claim descent from the Greek soldiers was
compared with the extant Greek population. For this purpose a combination of ninety
three (93) biallelic Y chromosome SNPs (Table II) and a set of 16 YSTRs were used
(Table IV). This extensive analysis of Y diversity within Greeks and three Pakistani
The genetic relationship between the three Pakistani populations and the
Greeks can now be judged in the light of phylogenetic analyses and corresponding
statistical results. The phylogenetic results (Figure IX) showed that clade H, clade I
and the clade L haplogroups are the major haplogroups that separate Pakistani
not in any of the Greek samples (Figure IX). However, the Indian specific branch
H2*-APT was not present in any Pakistani ethnic group but a low frequency (1.3%)
was observed in Greek population (Firasat et al., 2007). The presence of the Indian
specific sub-clade H2*-APT haplogroup in the Greek is the first time that this
haplogroup has been observed in any western European population and could
haplogroup (Rootsi et al., 2004). The consistency of this result was also seen in our
analyses and 19.5% Greeks have I-M170 Y chromosome (Figure IX). This
haplogroup was absent in Burusho, Kalash and Pathan. Low contribution of this
94
Similarly clade L* observed only in Pakistani populations and absent in the
Greeks (Figure IX). Like haplogroup H*, the L*-M20 and R2-M124 are indigenous to
the Indus Valley and south west Asia. Clade L* has been suggested to be associated
with the spread of agriculture in the Indus Valley between 7000-2000 B.C. (Qamar et
al., 2002). All L*-M20 derived Y chromosomes in the Kalash population were
the sub-clade L3a (Figure IX). In the same way the R2-M124 was absent in Greeks
and found 14.4% in Burusho and 5.74% in rest of Pakistani populations (Figure IX).
in North Africa, Middle East, and European countries (Semino et al., 2004). In the
the Greeks (2.5% and 21% respectively). Sub clade of E* haplogroup, E1b1b1a*-
the only branch that is present with low frequency in Pakistani populations (0.41%)
and high frequency in Greek population (17%). Among the three Pakistani
populations that claim Greek ancestry the Pathan were the only population in which a
low frequency of clade E1b1b1a* -M78 was present (2.1%) (Figure IX). Even more
compelling evidence in support of the genetic relationship between the Pathan and
(Figure XIII). One Pathan shared the same Y-STR haplotype; that included a
duplication of 10 and 13 repeats for the DYS425 locus; with three Greek individuals
and the other was separated from this cluster by a single mutation which enabled us
to estimate the Time to the most recent common ancestor (TMRCA)( mean SD),
using the Network software as between 2000 400 and 5000 1200 Years before
past (YBP) depending upon the observed (Kasyer et al., 2000) or inferred mutation
rates (Zhivotovsky et al., 2004). This coincides with the period of Alexanders
invasion during 327-323 B.C. In addition, this haplotype was not found in any other
95
in 53 individuals in the Y-STR Haplotype Reference Database (YHRD) Kasyer et al.,
2000) and was highly specific for the Balkans the highest frequency being in
Macedonia.
It is worth emphasizing here that the chance of picking up rare events largely
and Cruciani et al., (2006) also recommend caution when using microsatellite alleles
as surrogates of unique event polymorphisms. The genetic data alone do not tell us
to the historical record for this. There has been no known Greek admixture within the
admixture between the Greek slaves who were brought to this region by Xerxes
around one hundred and fifty years before Alexanders arrival, and the local
population, cannot be discounted (Firasat et al., 2007). At that time Afghanistan and
present day Pakistan were part of the Persian Empire (Wolpert, 2000). Nevertheless,
Alexanders army of 2500030000 mercenary foot soldiers from Persia and West
Asia and 50007000 Macedonian cavalry (Engles, 1981) perhaps provides a more
likely explanation because of their elite status and substantial political impact on the
region.
frequency in the Greek population (Firasat et al., 2007; Francalacci et al., 2003;
Hammer et al., 2001). Our results have shown that the high frequency of clade H1*-
M52 and L3a-PK3 (20.45% and 22.7% respectively) and the lack of clade E* in the
gene pool of Kalash, make the Kalash distinct from the Greeks (Figure IX).
The statistical analysis of results has also shown the highest pair-wise genetic
distance [ST (0.213) and (8.066)] values for the Kalash population (Table XI).
Moreover, the Kalash form a distinct cluster in the principal component analysis
(Figure X). On the basis of these results it is thus concluded that the true Greek
96
The presence of a unique population specific L3a-PK3 haplogroup in Kalash
the median TMRCA for the Kalash L3a lineages as 970 YBP (200-3500 YBP). This
coincides with the arrival of the Kalash from Afghanistan into the Chitral Valley in
northern Pakistan during the tenth and eleventh century AD (Lines, 1999).
principal component analysis placed Burusho as being distinct from the Greek and
closer to their neighbors in Pakistan (Figure X), suggesting that the linguistic
differences arose after the common Y pattern was established. Alternatively, there
may have been sufficient Y gene flow between populations to eliminate any initial
provides evidence in support of the Greek origins for a very small proportion of
Pathan as demonstrated by clade E* network (Figure XIII) and low pair-wise genetic
distances between these two populations (Table XI). The contribution to the Kalash
requires the assumption that extant Greeks are representative of Alexanders armies.
The failure to find a conclusive Y link with the extant Greek population could also be
attributed to the fact that besides the 5000-7000 men strong Macedonian cavalry,
Persia and West Asia (Engels, 1981) and populations from Pakistan have been
shown to be closer to those from West Asia (Qamar et al., 2002; Quintana-Murci et
al., 2001).
97
PART 3
the published haplogroup frequency data at similar molecular resolution. Table XII
provides all information about Asian reference population that was used in this
analysis.
haplogroup C*, haplogroup J*, haplogroup L*, and haplogroup R*, which together
account for 85.5% of total Y chromosome of Pakistani population (Table VI). The
47.5% (including all the derivatives) of the total Pakistani population. The world wide
data of Y chromosome show that the R* haplogroup with high frequency is present
the Figure XV adapted from Gyden et al., 2007, the Kyrgyz Y chromosomes in
central Asia have more than 50% haplogroup R*. The frequency gradually
decreases in Kara kalpak (34%) and Kazak (11%). In west Asia the highest
(25.6%), Syria (25%), Iraq (17.3%) and Lebanon (6%). Haplogroup R* is found in the
The second most abundant major clade is haplogroup J*, which occurs with
Bank, Jordan, Lebanon, Syria and Iraq: Semino et al., 2004). The high frequencies
among populations of the Middle East, North Africa and East Africa provide evidence
al., 1999). However, J2* originated in northern part of the Fertile Crescent. The
presence of this haplogroup in Europe and in India, Pakistan and in Nepal reveals
that haplogroup J2* expanded in both east and west directions (Al-Zahery et al.,
(Flores et al., 2005), 37.2%/9.9% in Oman, 19.7/12.2% in Egypt (Luis et al., 2004),
9.2%/ 24.3% in Turkey (Cinnioglu et al., 2004)31%/ 26.6% in Iraq (Al-Zahery et al.,
2003), 13.8% / 18.9% in Iran (Nasidze et al., 2004; Underhill et al., 2000; Wells et al.,
2001), 16.3% /29.8% in Lebanon (Hammer et al., 2000; Semino et al.,2004; Wells et
al., 2001), 32.4%/ 22.5% in Syria (Crucani et al., 2004; Di Giacomo et al., 2004;
al., 2000; Nebel et al., 2001), 2.5% / 0.5% in Somalia (Sanchez et al., 2005) and
found on the Indian subcontinent, Sri Lanka and in parts of SE Asia. The C1*
haplogroup found at low frequency in Japan, while C2* is found predominantly in New
southeast or central Asia. From central Asia this haplogroup is expanded towards
northern Asia and the Americas, and low concentrations are also found in eastern
and central Europe, where it may represent evidence of the westward expansion of
the Huns in the early middle ages. C4* is found among aboriginal Australians and a
99
The Hazara are an ethnic group in Pakistan that claim to be
suggested that C3 chromosome spread widely during the time when Genghis Khan
Hazara and 8.2% of Burushos (Table VI). In a study conducted by Zerjal et al. (2003)
the median-joining network (Bandelt et al., 1999) links the Hazara population to the
male descendents of Genghis Khan (Figure XVI). This is due to the presence of the
the star haplotype was not observed in Burusho population indicating separate origins
migrated south and reached the rugged, mountainous Pamir Knot region. Their L*
haplogroup may have been born about 30,000 years ago and represents the earliest
known as the Indian Clan. Today, the L* haplogroup is found primarily as sub-group
L1 in India and Sri Lanka. Sub-group L3* is found mostly in Pakistan. Haplogroup L*
can also be found in low frequencies in the Middle East and in Europe along the
Mediterranean coast.
Sengupta et al., 2006, Thamseem et al., 2006 alongwith Cordaux et al., 2004 and
Basu et al., 2003 reveal that 7-15% Indian males have L* haplogroup while10.8%
Pakistani males carry this haplogroup (present study). As shown in Figure XVII, and
the work conducted by Wells et al., 2001, a very high frequency of haplogroup L* was
present in South India and western Pakistan than in south Pakistan. However a low
haplogroup L* absent in east India. A low frequency was found in Oman (0.8%: Luis
100
et al., 2004), Iraq (1%: Al-Zahery et al., 2003), Lebanon (2%: Hammer et al., 2000;
Semino et al., 2004; Wells et al., 2001), and Greece (1.1%: Di Giacomo et al., 2003;
African population (Knight et al., 2003). This haplogroup appears at low frequency all
around Africa, but is at its highest frequency in Pygmy populations. In current study,
two Pakistani males i.e. one that belongs to Brahui the Dravidian speaking population
and the second one that belongs to Makrani-Negroid from the southern population.
Median-joining network (Bandelt et al., 1995) for the M60 derived Y haplotypes for
DYS19, 389I, 389b, 390 and 392 revealed that the Brahui sample (Y-STR haplotype
14_11_18_24_13) differed from three Sukuma individuals (Knight et al., 2003) at the
DYS19 locus only (16_11_18_24_13) (Figure XVIII). However, the Makrani Negroid
Hadzabe population at the 389b, 390, and 392 loci (15_10_17_20_13) (Table XIII).
The time of separation between the populations, estimated by the software Network
(Bandelt et al., 1995) was approximately 5000-10,000 years. These results exclude
an ancient migration and suggest that a more recent migratory event is responsible
legacy of the slave trade that existed between the southern coast of Pakistan and
East Africa.
men in East and Southeast Asia carry this haplogroup; however, a low frequency
101
In comparison with worldwide data, it is suggested that the gene pool of
the populations of the east and south east Asia. It is illustrated by the presence of
frequently found haplogroups like, J* and R* etc. are also contributed in western Asia
and the European gene pool but not found in China and Japan. However, the low
indicates that the Karakoram Mountains, which separate Pakistan and China, form a
formidable barrier to gene flow from the north. The Hazara are the only population
that have 60% C3 Y-chromosome shows significant East Asian (Mongolian) ancestry
but historical records indicate that they did not cross this geographical boundary and
102
Table XII: Description of World populations.
Population
Middle East:
Central Asia:
South Asia:
103
Northeast Asia:
Southeast Asia:
104
Figure XV: The frequencies of Major haplogroups in Asian population. The
105
Figure XVI. Median-joining network of C* lineages. The central star-cluster
DYS389b-DYS390-DYS391-DYS392-DYS393-DYS388-DYS425-DYS426-
106
Figure XVII: Distribution of L* haplogroup in Indo Pak sub continent.
107
Table XIII: Y-STRS data of clade B* lineages in Pakistan and African
populations.
DYS19_389I_389b_390_392
Hadzabe
Sukuma
Makrani
Lisongo
Negroid
TOTAL
Brahui
Biaka
Mbuti
San
H1 14_11_15_25_14 1 1
H2 14_11_18_24_13 1 1
H3 15_10_14_21_13 2 2
H4 15_10_15_20_13 1 1
H5 15_10_15_22_13 7 7
H6 15_10_17_20_13 1 1
H7 15_10_18_21_11 1 1
H8 15_11_16_23_13 1 1
H9 16_10_15_24_13 2 1 1
H10 16_11_13_24_13 2 2
H11 16_11_14_24_13 1 1
H12 16_11_15_25_13 1 1
H13 16_11_16_20_13 1 1
H14 16_11_16_23_13 1 1
H15 16_11_18_24_13 3 3
H16 16_7_14_24_13 1 1
H17 16_7_15_24_13 1 1
H18 16_7_16_24_13 1 1
H19 17_11_13_24_13 1 1
H20 17_11_14_24_13 1 1
H21 17_11_16_20_13 1 1
H22 17_7_16_24_13 1 1
H23 18_11_16_23_13 1 1
108
Figure XVIII: Median-joining network of clade B* lineages in Pakistan and
109
Figure XIX: Geographic distribution of haplogroup O3.
110
PART 4
migrations from the west over the centuries. Present day Pakistan is bordered by
Iran and Afghanistan on the west, India towards the east and China in the north. The
BALTI:
The Balti reside in eastern Baltistan in northern Pakistan, and there are
Tibetan language and they are thought to have originated in Tibet. However, not all
Balti speakers that are found in Pakistan are from Tibetan stock. With the passage
of time many other populations that entered their territory, such as the Shins, Arabs,
Persian and Turks gradually mixed with the Balti people. Although this study
analyzed only a few unrelated Balti samples yet they did not observe Y lineages
commonly found in Tibet. Clade D* which is present at high frequency in the Tibetan
population was not observed in the Balti (Table VI). The results were consistent with
HAZARA:
Hazara individuals have typical Mongolian features and they claim to be descendants
of Genghis Khans army. Their name is derived from the Persian word hazar
111
meaning thousand, because troops were left behind in detachments of a thousand
(Qamar et al., 2002). An earlier study done on a limited number of samples (n = 33)
has shown them to be closer to populations in Mongolia (Qamar et al., 2002) and the
star Y-STR haplotype (Figure XVI) observed in this population suggested that they
The present study analyzed a much larger population sample (n =224) from a
wider geographical area in Pakistan. The earlier samples were collected from NWFP
and the additional samples were from Quetta, Baluchistan. Two haplogroups
(Table VI). Haplogroup R* is also present at high frequency in other ethnic groups of
Pakistan (53.5%, when the Hazara are excluded). However, haplogroup C* is rare in
are excluded. This haplogroup is fairly common in Central Asia and Mongolia and
points towards the Mongol origins of the Hazara population (Figure XXI).
BURUSHO:
The Burusho, who speak Burushaski, are of particular genetic, linguistic and
isolates in the world (Dani, 1991; Grimes, 1992). Approximately 60,000 Burusho are
estimated to reside in present day Pakistan. The samples used here were collected
from the valleys of the Karakorum Mountains in Hunza, Nagar and Yasin. The origin
of Burusho is not well documented. Some claim they are descendants of four
generals in Alexanders army (Dani, 1989). Others believe them to be Dardics from
Central Asia, or nomads from Pamir, who migrated to this area, and displaced the
Studies with the autosomal (Ayub et al., 2003; Mansoor et al., 2004) and Y
chromosomal markers (Firasat et al., 2007) suggest that the Burusho have the same
112
genetic makeup as their geographical neighbours in Pakistan. Preliminary study by
Wells et al. using a limited number of Y markers showed that the Hunza Burusho
clustered with populations from Tajikistan (Wells et al., 2001) but found no such
evidence using a larger number of markers. The high frequencies of Central Asian
haplogroup C* chromosomes in the Burusho and Hazara indicate that these arose in
Central Asia before the separation of these two Pakistani populations (Mohuuddin et
al., 2006). There is also no evidence of genetic relatedness with the Greek.
Haplogroup C* is absent in Greeks (Francalacci et al., 2003; Rootsi et al., 2004), and
Although they share R1a1* hapologroups but the branch derived from R1a1* that
microsatellite variation.
KALASH:
The Kalash have been isolated for centuries in the Hindu Kush mountain
ranges of northern Pakistan. Their language, Kalasha, belongs to the Dardic group of
Oral traditions ascribe their origins to a mythical place called Tsiam, which some
claim refers to Syria (Decker, 1992). Various scholars have attributed their origins to
population (Francalacci et al., 2003; Hammer et al., 2001) and the presence of clade
H* (20%) and L3a (23%) make the Kalash distinct from the Greeks (Firasat et al.,
that they have a predominantly European component and their possible origin is
paternal (Y chromosome SNP and STR) (Qamar et al., 2002) and autosomal STR
(Mansoor et al., 2004) has also demonstrated their greater affinity with European
113
populations. In the principal component analyses based on haplogroup frequencies,
the Kalsah are distinct from the other ethnic groups of Pakistan (Figure X). The
genetic drift (L3a) in this population. The timing of their isolation can be better
studied by analyzing populations from Nuristan, Afghanistan from where they are
frequencies in the Burusho, Kalash and the Pathan based on 16 Y-STRs also shows
a high degree of Kalash specific substructure. Except for one individual all the
Kalash samples fall in one cluster. From the network it appears that H1*-M52 spread
to neighboring northern populations. Taken together these results suggest that the
high frequency of unique population specific SNPs and haplogroups in this group are
probably due to genetic drift in a population that has been isolated for centuries in the
PATHAN:
The last of the northern population with claims to Greek origins, the Pathans, occupy
vast tracts of land in Pakistan and neighbouring Afghanistan. In Pakistan the vast
majority of Pathans reside in the NWFP and Baluchistan province of Pakistan. The
Pathan populations and are the important centers of Pathan in Pakistan. According
population of present day Pakistan. Their language, Pashtu, is classified under the
114
Figure XX: Median-joining network H1*-M52 lineage fall in Burusho, Kalash
115
claim that either they are of Jewish origin (Ahmed, 1952) or are descendants of
chromosome that is present with large amount in Greek (Figure IX) provide an
evidence of a small Greek contribution to the Pathan gene pool that will likely require
However, earlier studies carried out by Quintana-Murci (2004) and Mansoor (2004)
using mitochondrial DNA and STR markers demonstrated that the Pathans are
mainly related to the Iranians and their geographic neighbors in northern Pakistan.
PARSI:
The origins of the Parsi are well-documented and there are only a few
thousand Parsi inhabitants in Pakistan now. These followers of the Persian Prophet
the collapse of the Sassanian Empire in the 7th century A.D. and settled in the
northwest Indian province of Gujarat in 900 A.D. where they were called the Parsi
___ meaning from Iran. Eventually they moved to Mumbai in India and Karachi in
Pakistan, from where the present population was sampled (Figure XXI). They speak
indo-European language.
The earlier study of their Y chromosomes (Qamar et al., 2002) showed that
the Parsis are genetically closer to Iranians than to their neighbors in Pakistan. In
this study, 39% of the Parsis sampled belonged to haplogroup J* (Table VI). This is
similar to the frequency of this haplogroup (40%) in the present day Iranian
population (Qamar et al., 2002). Surprisingly based upon their mitochondrial DNA
variation the Parsis were genetically close to Gujrati population of India (Quintana-
Murci et al., 2004) rather than to the Iranians, indicating a loss of mitochondrial DNA
of Iranian origin mainly due to their admixture with the local population in India after
116
their seventh century migration.
BALOCH:
Balochis are affiliated with the Iranian Baloch tribes across the south West
border with Iran and these people speak the language Balochi an Indo-Aryan
Researchers are unsure of their origins. Some scholars believe that they belong to
the northern regions of Elburz, a mountain range in North Iran, whereas others claim
the Baloch Y chromosomes carry the haplogroup R* and only 9% carry haplogroup
J* (Table VI). These results support the earlier observation (Qamar et al., 2002) that
used a limited number of Y markers. HLA data supports genetic relatedness among
the Baloch tribes of Iran and Pakistan (Farjadian et al., 2004). In worldwide surveys
of HGDP-CEPH HGDP cell line panels, the Baloch are closely related to their
geographic neighbours and share the same branch as populations from the Middle
BRAHUI:
Pakistan. About 1.5 million Brahuis reside the Sarawan and Jhalawan region of Kalat
state, Baluchistan (Hughes-Buller, 1991). They speak Brahui language that belongs
to the Dravidian language family (Grimes, 1992). Dravidians are found mostly in
southern India, Sri Lanka, Bangladesh, Pakistan, Afghanistan and Iran. Dravidians
117
Dravidian hypothesis, they originated in the Iranian province of Elam and were once
spread over a much larger area, including Iran, Pakistan, Afghanistan and all India
(McAlpin, 1974, 1981). According to some historical traditions, Brahuis are the
descendants of western Asian people (McAlpin, 1974, 1981) such as, Turko-Iranian
tribe and Scythians (Hughes-Buller, 1991). Some historians also claim that they
have the same origins as that of Baloch (Hughes-Buller, 1991; Quddus, 1990).
entered in South Asia with the expansion of Dravidian speaking farmers (Quintana-
Murci, 2001).
In order to detect its true origin a set of 117 Y Brahuis chromosome were
analyzed. The result of present study was compared with neighboring populations.
(Table VI) reveal the movement of population from west Asia to south Asia and from
60%: Quintana-Murci et al., 2001), and in the Fertile Crescent region that includes,
Palestinians (51%), Lebanese 46% and Syrians 57% (Hammer et al., 2000). These
results indicate that the haplogroup J* originated in west Asia and from there they
(26.5%) also confirmed these observations. The major movement of population from
west Asia to south Asia is correlated with the expansion of farming economy that
started between 6th and 5th millennia B.C. from Iran to Indo-Pak subcontinent. After
this, the other major development was the expansion of domesticated animals by the
pastoral nomadic. Probably the expansion of haplogroup J* has been associated with
the dispersal of farmers and pastoral nomadic (Dravidian) in southern Asia (Cavalli-
Sforza, 1988; Renfrew, 1987). However, Sengupta et al., 2006 suggests the origin of
118
carry L1-M76 haplogroup provides an idea that Brahui could migrated to Baluchistan
from India. It is also proved by the mean microsatellite variance which is higher in
MAKRANI NEGROID:
The Negroid Makrani has African physical traits, reside along the southern
been speculated that they represent migrants from Africa (Figure XXI) but the timing
of this migration is uncertain (Ansari, 1996). Although they do have low frequency of
proportion of L*, J* and R*. L* haplogroup are mostly restricted to the Indo-Pak
al., 2002) and mitochondrial DNA data supported these results. This data alongwith
their history as remnants of the east African slave trade indicated that they were
119
Figure XXI: Possible origins a) Hazara b) Kalash c) Parsi d) Makrani
Negroid
MONGOLIA
West Eurasian Y lineages
Origins: Hazara
Origins:
Kalash
a b
Iran
Origins:Gujrat
Parsi
Mumbai Origins: Makrani
Negroid
c d
120
CONCLUSIONS:
understanding of human ancestry and diversity from both the maternal and paternal
migration from west Asia, Europe and to a less extent from East Asia has resulted in
a rich tapestry of socio-cultural, linguistic and biological diversity. This study provides
population with respect to each other as well as the other world population. These
incidence and prognosis of various diseases across different populations. The study
will provide major insights where a patients origin will be useful in determining the
composition will also be helpful in eliminating any spurious risk factors for different
diseases. Furthermore, apart from the inherited diseases, the study will be of
infectious diseases as well as the efficacy of drug treatment, heralding the era of
genomic medicine.
121
REFERENCES
-7-
Ahmad AKN. (1952). Jesus in heaven on earth. The Civil and Military Gazette Ltd,
Lahore, Pakistan.
Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, Smith J, Mangion J,
Roberton-Lowe C, Marshall AJ, Petretto E, Hodges MD, Bhangal G, Patel SG,
Sheehan-Rooney K, Duda M, Cook PR, Evans DJ, Domin J, Flint J, Boyle JJ, Pusey
CD and Cook HT.(2006). Copy number polymorphism in Fcgr3 predisposes to
glomerulonephritis in rats and humans. Nature. 439:851-855.
Anderson S, Bankier AT, Barrell BG, De Bruijn MHL, Coulson AR, Drouin J, Eperon
IC, Nierlich DP, Roe B A, Sanger F, Schreier PH, Smith AJH, Staden R and Young
IG.(1981). Sequence and organization of the human mitochondrial genome. Nature.
290: 457-465.
Ansari SSA.(1996). The Afghan or Pathans. In: The Musalman races found in
Sindh, Baluchistan and Afghanistan. Indus publications, Karachi.pp9-16.
Bandelt HJ, Forster P and Rohl A.(1999). Median-joining networks for inferring
intraspecific phylogenies. Mol Biol Evol. 16: 37 48.
Barley J, Blackwood A, Miller M, Markandu ND, Carter ND, Jeffery S, Cappuccio FP,
MacGregor, GA and Sagnelle GA.(1996). Angiotensin converting enzyme gene I/D
polymorphism, blood pressure and the rennin-angitensin system in Caucasians and
Afro-Caribbean peoples. J Hum Hypertens. 10: 31-35.
Batzer MA, Kilroy GE and Richard PE.(1990). Structure and variability of recent
inserted Alu family members. Nucleic acids Res. 18:6793-6798.
122
Batzer MA, Gudi VA, Mena JC, Foltz DW, Herrera RJ and Deininger PL.(1991).
Amplification dynamics of Human-specific (HS) Alu family members. Nucleic Acids
Res.19:3619-3623.
Batzer MA, Acrot SS, Phinney JW, Alegria-Hartman M, Kass DH, Milligan SM,
Kimpton C, Gill P, Hochmeister M, Ioannou PA, Herrera RJ, Boudreau DA, Scheer
WD, Keats BJ, Deininger PL, Stoneking M.(1996). Genetic variation of recent Alu
insertion in human populations. J mol Evol. 42:22-29.
Batzer MA and Deininger PL.(2002). Alu repeats and human genomic diversity. Nat
Rev Genet. 3:370-379.
Birnboim HC and Straus NA.(1975). DNA from Eukaryotic cells contain unusually
long pyrimidine sequences. Can J Biochem. 53:640-643.
Bowcock A M, Kidd J, Moutain JL, Hebert JM, Carotennuto L, Kidd KK and Cavalli-
Sforza LL.(1991). Drift, admixture, and selection in human evolution: a study with
DNA polymorphisms. Proc Natl Acad Sci. USA 88: 839-843.
Brook JD, McCurrach ME, Harley HG, BucklerA J, Church D, Aburatani H, Hunter K,
Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA,
Crow S, Davies J, Shelbourne P, Buxton J, Jones C, Juvonen V, Johnson K, Harper
PS, ShawDJ, and Housman DE.(1992). Molecular basis of myotonic dystrophy:
expansion of trinucleotide (CTG|) repeat at the3 end of the transcript encoding a
protein kinase family member. Cell. 68:799-808.
Brooks MB, Gu W, Barnas JL, Ray J and Ray KA.(2003). Line 1 insertion in the
Factor IX gene segregates with mild hemophilia B in dogs. Mamm Genome. 14:788-
795.
Brown P, Sutikna T, Morwood MJ, Soejono RP, Jatmiko, Saptomo EW, Due RA.
(2004). A new small-bodied hominin from the Late Pleistocene of Flores, Indonesia.
Nature. 431:1055-1061.
123
Brown WM, George M Jr and Wioson AC.(1979). Rapid evolution of animal
mitochondrial DNA. Proc Natl Acad Sci. USA 76:1967-1971.
Budowle B, Moretti TR, Niezgoda SJ and Brown BL. (1998). CODIS and PCR-
based short tandem repeat loci: Law enforcement tools. In: Second European
Symposium on Human Identification 1998, Promega Corporation, Madison,
Wisconsin pp 73-88.
Budowle B and Chakraborty R.( 2001). Population variation at the CODIS core
short tandem repeat loci in Europeans. Leg Med (Tokyo) 3:29-33.
Cann RL, Stoneking M and Wilson AC.(1987). Mitochondrial DNA and human
evolution. Nature. 325: 31-36.
Carter NP.(2007). Methods and strategies for analyzing copy number variation using
DNA microarrays. Nat. Genet. 39: Suppl: S16-S21.
Cavalli-Sforza LL, MenozziP and Piazza A.(1994). The History and Geography of
Human Genes. Princeton University Press, Priceton.
Cavalli-Sforza LL.(2005). The Human Genome Diversity Project: past, present and
future. Nat Rev Genet. 6:333-40.
124
Cooper DN and Krawczak M.(1995). An introduction to the structure, function and
expression of human genes. In: Human gene mutation. Bios Scientific Publishers
Limited. UK. pp 19-48.
Csink AK and Henikoff S.(1998). Some thing from nothing: the evolution and utility
of satellite repeats. Trends Genet.14: 200-204.
Dani AH.(1989). Early history the early inhabitants. In:History of Northern Areas of
Pakistan. National Institute of Historical and Culture Research, Islamabad, Pakistan.
pp110-157.
Deininger PL and Slagel VK. (1988). Recently amplified Alu family members share
a common parental Alu sequences. Mol. Cell Biol. 8:4566-4569.
125
Deininger PL, Batzer MA, Hutchinson III CA and Edgell MH. (1992). Master genes
in mammalian repetitive DNA amplification. Trend Genet. 8:307-312.
de Knijff P.(2000). Message through bottle necks: On the combined use of slow
and fast evolving polymorphic markers on the human Y chromosome. Am J Hum
Genet. 67:1055-1061.
Dietrich W, Katz H, Lincoln SE, Shin H-S, Friedman J, Dracopoli NC and Lander
ES.(1992). A genetic map of mouse suitable for intra specific crosses. Genetics
131:423-447.
Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Saltkin M and Freimer NB. (1994).
Mutational process of simple-sequence repeat loci in human populations. Proc Natl
Acad. Sci. 91:3166-3170.
Dong SL, Wang E, Hsie L, Cao YX, Chen XG, Gingeras TR.(2001). Flexible use of
high density oligonucleotide arrays for single nucleotide polymorphism discovery and
validation. Genome Res. 11:1418-1424.
Engels DW.(1981). Alexander the Great and the logistics of the Macedonian Army.
Berkeley, CA: University of California Press.
Epplen JT, Mc Carrey JR, Sutou S and Ohno S.(1982). Base sequences of a cloned
snake W-chromosome DNA fragment and identification of a male putative mRNA in
the mouse. Proc Natl Acad. Sci. USA 79:3798-3802.
126
Farjadian S, Naruse T, Kawata H, Ghaderi A, Bahram S, Inoko H.(2004).
Molecular analysis of HLA allele frequencies and haplotypes in Baloch of Iran
compared with related populations of Pakistan. Tissue Antigens. 6:581-587.
Feng Q, Moran JV, Kazazian HHJr and Boeke JD.(1996). Human L1 retrotransposon
encodes a conserved endonuclease required for retrotransposition. Cell 87:905-916.
Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW.
(2005). Discovery of human inversion polymorphisms by comparative analysis of
human and chimpanzee DNA sequence assemblies. PLoS Genet. 1: 489498.
Fisher EM, Beer-Romero P, Brown LG, Ridley A, McNeil JA, Lawrence JB, Willard
HF, Bieber FR, Page DC.(1990). Homologous ribosomal protein genes on the human
X and Y chromosomes: escape from X inactivation and possible implications for
Turner syndrome. Cell 63:1205-1218.
Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S, Verkerk AJM,
Holden JH, Fenwick RG, Warren ST, Oostra BA, Nelson DL and Caskey CT. (1991).
Variation of the CGG repeats at the fragile X site results in the genetic
instability:resolution of the Sherman paradox. Cell. 67:1047-1058.
Fu Y-H, Pizzuti A, Fenwick RGJr, King J, Rajnarayan S, Dunne PW, Dubel J, Nasser
GA, Ashizawa T, de Jong P, Wieringa B, Korneluk R, Perryman MB, Epstein HF, and
Caskey CT.(1992). An unstable triplet repeat in a gene related to myotonic muscular
dystrophy. Science. 255:1256-1258.
127
Gilbert N, Lutz-Prigge S and Moran J V.(2002). Genomic deletions created upon
LINE-1 retrotransposition. Cell 110:315-325.
Giles RE, Blanc H, cann HM and Wallace DC.(1980). Maternal inheritence of human
mitochondrial DNA. Proc Natl Acad. Sci. USA 77:6715-6719.
Gill P, Ivanov PL, Kimpton C, Piercy R, Benson N, Tully G, Evett I, Hagelberg E and
Sullivan K.(1994). Identification of the remains of the Romanov family by DNA
analysis. Nat Genet. 6:130-135
Grubb R and Laurell AB.(1956). Hereditary serological human serum groups. Acta
Pathol Microbiol Scand. 39:390-398.
Hacia JG, Fan J-B, Ryder O, Jin L, Edgemon K, Ghandour G, Mayer RA, Bryan Sun,
Hsie L, Robbins CM, Brody LC, Wang D, Lander ES, Lipshutz R, Fodor SPA and
Collins FS.(1999). Determination of ancestral alleles for human singlenucleotide
polymorphisms using high-density oligonucleotide arrays. Nat Genet. 22: 164-167.
Hamada H and Kakunaga T.(1982). Potential Z-DNA forming sequences are highly
dispersed in the human genome. Nature. 298:396-398.
Hammer MF, Spurdle AB, Karafet T, Bonner MR, Wood ET, Novelletto A, Malaspina
P, Mitchell RJ, Horai S, Jenkins T and Zegura SL.(1997). The geographic
distribution of human Y chromosome variation. Genetics.145:787-805.
Hammer MF, Karafet TM, Rasanayagam A, Wood ET, Altheide TK, Jenkins T,
Griffiths RC, Templeton AR and Zegura SL.(1998). Out of Africa and back again:
Nested cladistic analysis of human Y chromosome variation. Mol Biol Evol. 15: 427-
441.
128
Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet T, Santachiara-
Benerecetti S, Oppenheim A, Jobling MA, JenkinsT, Ostrer H and Bonne-Tamir
B.(2000). Jewish and Middle Eastern non-Jewish populations share a common pool
of Y-chromosome biallelic haplotypes. Proc Natl Acad Sci. 97: 6769-6774.
Hammer MF, Karafet TM, Park H, Omoto K, Harihara S, Stoneking M, and Horai
S.(2006). Dual origins of the Japanese: Common ground for hunter-gatherer and
farmer Y chromosomes. J Hum Genet. 51:47-58.
Harris H. (1966). Enzyme polymorphism in man. Proc R Soc Lond B Biol Sci.
22:298-310.
Hearn CM, Ghosh S and Todd JA.(1992). Microsatellite for linkage analysis of
genetic traits. Trends Genet. 8: 288-294.
Hinds DA, Kloek AP, Jen M, Chen X and Frazer KA.(2006). Common deletions and
SNPs are in linkage disequilibrium in the human genome. Nat Genet. 38: 8285.
Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P,
Oefner P, Renfrew C, Villems R and Forster P.(2007). Revealing the prehistoric
settlement of Australia by Y chromosome and mtDNA analysis. Proc Natl Acad. Sci.
104:87268730.
Hurles ME, Nicholson J, Bosch E, Renfrew C, Sykes BC and Jobling MA. (2002). Y
chromosomal evidence for the origins of Oceanic-speaking peoples. Genetics. 160:
289303.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW. and
Lee C.(2004). Detection of large-scale variation in the human genome. Nat Genet.
36: 949951.
Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA,
Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor
BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M,
Cann HM, Hardy JA, Rosenberg NA, Singleton AB.(2008). Genotype, haplotype and
copy-number variation in worldwide human populations. Nature 451:998-1003.
Jefferys AJ, Wilson V and Thein SL.(1985). Individual- specific finger printsof
human DNA. Nature. 316:76-79.
Jeffery AJ, Royle V, Wilson V and Wong Z.(1988). Spontaneous mutation rate to
new length allele at tandem repetitive hypervariable loci in human DNA. Nature.
332:278-281.
Jeng JR, Harn HJ, Jeng CY, Yueh KC and Shieh SM.(1997). Angiotensin I
converting enzyme gene polymorphism in Chinese patients with hypertension. Am J
Hypertens. 10: 558-561.
Jorde LB, Bamshad MJ, Watkins WS, Zenger R, Fraley AE, Krakowiak PA,
Carpenter KD, Soodyall H, Jenkins Tand Rogers AR.(1995). Origins and affinities of
modern human: a comparison of mitochondrial and nuclear genetic data. Am J Hum
Genet. 57: 523-538.
Kajikawa M and Okada N.(2002). LINEs mobilize SINEs in the eel through a shared
3` sequence. Cell 111:433-444.
130
Kan YW and Dozy AM.(1978). Polymorphism of DNA sequence adjacent to human
globin structural gene: relation ship to sickle mutation. Proc Natl Acad. Sci. USA
75:5631-5635.
Kapitonov V and Jurka J.(1996). The age of Alu subfamilies. J Mol Evol. 42:59-65.
Karafet TM, Osipova LP, Gubina MA, Posukh OL, Zegura SL, and Hammer MF.
(2002). High levels of Y-chromosome differentiation among native Siberian
populations and the genetic signature of a boreal hunter-gatherer way of life. Hum
Biol. 74: 761-789.
Karafet TM, Lansing JS, Redd AJ, Reznikova S, Watkins JC, Surata SP,
Arthawiguna WA, Mayer L, Bamshad M, Jorde LB, Hammer MF.(2005). Balinese Y-
chromosome perspective on the peopling of Indonesia: Genetic contributions from
pre-Neolithic hunter-gatherers, Austronesian farmers, and Indian traders. Hum Biol.
77: 93-114.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL and Hammer
MF.(2008). New binary polymorphisms reshape and increase resolution of the
human Y chromosomal haplogroup tree. Genome Res.185:830-838.
Kimmel M and Chakraborty R.(1996). Measure of variation at DNA repeat loci under
a generalized stepwise mutation model. Theor Pop Biol. 50:345-367.
131
King TE, Bowden GR, Belaresque PL, Adams SM, Shanks ME and Jobling MA.
(2007). Thomas Jeffersons Y chromosome belongs to a rare European lineage. Am
J Phys Anthropol. 132: 583589.
Klein RG.(1989). The Human Career: Human Biological and Cultural Origin.
Chicago: Chicago University Press.
Knight A, Batzer MA, Stoneking M, Tiwari HK, Scheer WD, Herrera RJ, Deinninger
PL.(1996). DNA sequences of Alu elements indicatea recent replacement of the
human autosomal genetic complement. Proc Natl Acad. Sci. USA 93: 4360-4364.
Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, Louis D,
Ruhlen M, Mountain JL.(2003). African Y chromosome and mtDNA divergence
provides insight into the history of click languages. Curr Biol. 13:464-473.
Koschinsky ML, Boffa MB, Nesheim ME, Zinman B, Hanley AJG, Harris SB, Cao H
and Hegele RA.(2001). Association of a single nucleotide polymorphism in CPB2
encoding the thrombin-activable fibrinolysis inhibitor (TAFI) with blood pressure. Clin
Genet. 60:345-349.
Kruse PE Jr, and Patterson MK.(1973). Tissue Culture: Methods and application.
Academic Press, NewYork. pp16-17.
Lahr MM and Foley RA.(1994). Multiple dispersals and modern human origins.
Evolutionary Anthropology. 3: 48-60.
La Spada AR, Wilson AM, Lubahn DB, Harding AE and Fish beck KH.(1991).
Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy.
Nature. 352:77-79.
Leakey R.(1994). The origin of human kind. Basic Books, A Division of Harper
Colllins, New York.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM,
Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.(2008). Worldwide human
relationships inferred from genome-wide patterns of variation. Science. 319:1100-4.
133
Litt M and Luty JA.(1989). A hypervariable microsatellite revealed by in vitro
amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J
Hum Genet. 44:397-401.
Lucotte G and Ngo NY.(1985). p49f, A highly polymorphic probe, that detects Taq1
RFLPs on the human Y chromosome. Nucleic Acids Res.13:8285.
Ludwing E, Comeli PS, Aderson JL, Marshall HW, Lalouel JM, and Ward RH.
(1995). Angiotensin-converting enzyme gene polymorphism is associated with
myocardial infarction but not with development of coronary stenosis. Circulation
91:2120-2124.
Luis JR, Rowold DJ, Regueiro M, Caeiro B, Cinniolu C, Roseman C, Underhill PA,
Cavalli-Sforza LL, Herrera RJ.(2004). The Levant versus the Horn of Africa:
evidence for bidirectional corridors of human migrations. Am J Hum Genet. 74:532-
44.
Malik HS, Burke W D and Eickbush T H. (1999). The age and evolution of non-LTR
transposable elements. Mol Biol Evol .16:793-805.
Marri MKBB.(1985). Search lights on Baloch and Balochistan. 3rd Edition. Nisa
traders, Quetta, Pakistan.
Mathias SL, Scott AF, Kazazian H H Jr, Boeke J D and Gabriel A.(1991). Reverse
transcriptase encoded by a human transposable element. Science. 254:1808-1810.
McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S,
Gabriel SB, Lee C, Daly MJ, Altshuler DM and The International HapMap
Consortium.(2006). Common deletion polymorphisms in the human genome. Nat
Genet. 38: 8692.
Mc Clay JL, Sugden K, Koch HG, Higuchi S and Craig IW.(2002). High-throughput
single nucleotide polymorphisms genotyping by fluorescent competitive allele-specific
polymerase chain reaction (SNiPTag). Anal Biochem. 301:200-206.
134
Evolution and Variation , The Biomedical & Life Sciences Collection, Henry Stewart
Talks Ltd, London. (online at http://hstalks.com/bio).
Meselson M and Yucan R. (1968). DNA restriction enzyme from Ecoli. Nature
217:1110-1114.
Morrish TA, Gilbert N, Myser JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer M A
and Moran JV.(2002). DNA repair mediated by endonuclease-independent LINE-1
retrotransposition. Nat Genet. 31:159-165.
Nanavutty P.(1997). The Parsis. National Book Trust, New Delhi, India.
Nicholas Awde and Asmatullah Sarwan. Pashto Dictionary & Phrasebook: Pashto-
English, English-Pashto. (Hippocrene Books, 2003, ISBN 078180972X) retrieved 10
January 2007.
135
Oakey R, Tyler-Smith C.(1990). Y chromosome DNA haplotyping Suggest the most
European and Asian men are descended from one of two males. Genomics. 7:325-
330.
Olivio PD, Van de Walle MJ, LaipisPJ and Hauswirth WW.(1983). Nucleotide
sequence evidence for rapid genotypic shifts in the bovine mitochondrial DNA D-
loop. Nature. 306:400-402.
Ostertag EM and Kazazian HHJr. (2001). Twin priming a proposed mechanism for
the creation of inversion in L1 retrotransposition. Genome Res. 11:2059-2065.
Ostertag EM, DeBerardinis RJ, Goodier JL, Zhang Y, Yang N, Gerton GL and
Kazazian HHJr. (2002). A mouse model of human L1 retrotransposition. Nat Genet.
32:655-660.
Pandya A, King TE, Santos FR, Taylor PG, Thangaraj K, SinghL, Jobling MA, Tyler-
Smith C.(1998). A polymorphic human Y-chromosomal G to A transition found in
India. Ind J Hum Genet. 4:5261.
Passarino G, Cavalleri GL, Lin AA, Cavalli-Sforza LL, Brresen-Dale AL, Underhill
PA.(2002). Different genetic components in the Norwegian population revealed by
the analysis of mtDNA and Y chromosome polymorphisms. Eur J Hum Genet.
10:521-529.
Payne R, Tripp M, Weigle J, Bodmer W and Bodmer J.(1964). A new leukocyte iso-
antigen system in man. Cold Spring Harbor Quantitative Biology.29:28p5.
Prak EL and Haig HKJr. (2000). Mobile elements and the human genome. Nature
Rev Genet. 1:134-144.
136
Qamar R, Ayub Q, Khaliq S, Mansoor A, Karafet T, Mehdi SQ and Hammer MF.
(1999). African and Levantine origins of Pakistani YAP+ Y chromosomes. Hum Biol.
71:745-755.
Qi XQ, Bakht S, Devos KM, Gale MD and Osbourn A. (2001). L-RCA (Ligation
rolling circle amplification): a general method for genotyping of single nucleotide
polymorphism (SNPs). Nucleic Acids Res. 29: U68-U74.
Queller DC, Strassmann JE and Colin RH.(1993). Microsatellites and kinship. Tree
8:285-288.
Quintana-Murci L, Krausz C, Zerjal T, Sayar SH, Hammer MF, Mehdi SQ, Ayub Q,
Qamar R, Mohyuddin A, Radhakrishna U, Jobling MA, Tyler-Smith C and
McElreavey K.(2001). Y-Chromosome Lineages Trace Diffusion of People and
Languages in Southwestern Asia. Am J Hum Genet. 68:537-542.
Ramsay G. (1998). DNA chips: state of the art. Nat Biotechnol. 16:40-44.
Regueiro M, Cadenas AM, Gayden T, Underhill PA and Herrera RJ. (2006). Iran:
Tricontinental nexus for Y-chromosome driven migration. Hum Hered. 61:132143.
137
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero
MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzlez JR, Gratacs
M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R,
Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J,
Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad
DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW
and Hurles ME. (2006). Global variation in copy number in the human genome.
Nature. 444: 444-454.
Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD,
Pyntikova T, van der Veen F, Skaletsky H, Page DC and Rozen S. (2006). High
mutation rates have driven extensive structural polymorphism among human Y
chromosomes. Nat Genet. 38:463-467.
Righmire GP.(1989). Middle stone agehumans from eastern and southern Africa. In:
P Mellars and CB Stringer (eds): Te human Revolution. Edinburgh: Edinburgh
University Press, pp109-122.
Robertson GS. (1896). The Kafirs of the Hindu-Kush. Oxford University Press,
Karachi, Pakistan.
138
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA and
Feldman MW.(2002). Genetic structure of human populations. Science. 298:2381-
2385.
Ruvolo ME, Zehr S, von Dornum M, Pan D, Chang B and Lin J.(1993).
Mitochondrial COII sequences and modern human origins. Mol Biol Evol 10:1115-
1135.
Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA and Arnheim N.
(1985). Enzymatic amplification of beta-globin genomic sequences and restriction
site analysis for diagnosis of sickle cell anemia. Science 230:1350-1354.
Sassaman DM, Dombroski BA, Moran JV, Kimberland ML, Naas TP, De Berardinis
RJ, Gabriel A, Swergold GD and Kazazian HHJr.(1997). Many humanL1 elements
are capable of retrotransposition. Nat Genet. 16:37-43.
Schunkert H, Hense HW, Holmer SR, Stender M, Perz S, Keil U, Lorell BH, and
Riegger GA. (1994). Association between a deletion polymorphism of the
Angiotensin- converting enzyne gene and left ventricular hypertrophy. N Engl J Med.
330:1634-1638.
Schurr TG, Maggi WR, Fowler K, Wallace DC. (2000). The ethnic origins of an
enigmatic south Asian population, the Kalasha of northern Pakistan, as revealed by
mtDNA variation. Am J Hum Genet. 67:217.
139
Scozzari R, Torroni A, Semino O, Sirugo G, Brega A and Santachiara Benerecetti
AS.(1988). Genetic studies on the Senegal population and mitochondrial DNA
polymorphism. Am J Hum Genet. 43:534-544.
Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA,
Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder
PP, Underhill PA.(2006). Polarity and temporality of high-resolution Y-chromosome
distributions in India identify both indigenous and exogenous expansions and reveal
minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 78:202-221.
Serre D and Hudson TJ. (2006). Resources for Genetic Variation Studies. Annu
Rev Genomics Hum. 7: 443-457.
Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM,
Clark RA, Schwartz S, Segraves R, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner
A, Gilliam TC, Trask B, Patterson N, Zetterberg A and Wigler M. (2005). Segmental
duplications and copy-number variation in the human genome. Am J Hum Genet.
77:78-88.
Shen MR, Batzer MA and Deininger PL. (1991). Evolution of the master Alu gene
(s). J Mol Evol. 33:311-320.
140
Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD, Chakraborty R, Jin L, and
Su B.(2005). Y-chromosome evidence of southern origin of the East Asian-specific
haplogroup O3-M122. Am J Hum Genet 77: 408-419.
Shriver MD, Jin L, Ferrell RE and Deka R. (1997). Micosatellite Data support an
early population expansion in Africa. Genomes Res 7: 586-591.
Sims LM, Garvey D and Ballantyne J. (2007). Sub-populations within the major
European and African derived haplogroups R1b3 and E3a are differentiated by
previously phylogenetically undefined Y-SNPs. Hum Mutat. 28:97.
Smith AF.(1996). The origin of interspersed repeats in the human genome. Curr
Opin Genet Dev. 6:743-778.
Strachan T and Read AP.(2004). Human Molecular Genetics, 3rd ed. Garland
Science, London and New York.
Stringer CB and Andrews P. (1988). Genetic and fossils evidence for the origin of
modern humans. Science. 239:1263-1268.
Swisher CC 3rd, Curtis GH, Jacob T, Getty AG, SuprijoA, Widiasmoro.(1994). Age
of the earliest known hominids in Java, Indonesia. Science 263: 1118-1121.
Thangaraj K, Singh L, Reddy AG, Rao VR, Sehgal SC, Underhill PA, Pierson M,
Frame IG, and Hagelberg E. (2003). Genetic affinities of the Andaman Islanders, a
vanishing human population. Curr Biol. 13:86-93.
Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, Reddy BM, Reddy
AG, Singh L. (2006). Genetic affinities among the lower castes and tribal groups of
India: inference from Y chromosome and mitochondrial DNA. BMC Genet. 7:42.
Tishkoff SA, Dietzsch E, Speed W, Pakstis AJ, Kidd JR, Cheung K, Bonne`-Tamir B,
Santachiara-Benerecetti AS, Moral P and Krings M.(1996). Global patterns of linage
disequilibrium at the CD4 locus and modern human origins. Science. 271:1380-
1387.
Todd J A, Aitman TJ, Cornall RJ, Ghosh S, Hall JRS, Hearne CM, KnighT AM, Love
JM, Mcaleer MA, Prins J-B, Rodrigues N, Lathrop M, Pressey A, Delarato NH,
Peterson LB and Wicker LS.(1991). Genetic analysis of auto immune type 1
diabetes mellitus in mice. Nature. 351: 542-547.
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H,
Albertson D, Pinkel D, Olson MV and Eichler EE.(2005). Fine-scale structural
variation of the human genome. Nat Genet. 37:727-732.
Ullu E and Tschudi C.(1984). Alu sequences are processed 7SL RNA genes.
Nature 312:171-172.
Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, Cavalli-
Sforza LL and Oefner PJ.(1997). Detection of numerous Y chromosome biallelic
142
polymorphisms by denaturing high-performance liquid chromatography. Genome
Res. 7:996-1005.
Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonn-
Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ,
Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, Cavalli-Sforza LL and
Oefner PJ.(2000). Y chromosome sequence variation and the history of human
populations. Nat Genet. 26:358-61.
Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ,
and Cavalli-Sforza LL.(2001). The phylogeography of Y chromosome binary
haplotypes and the origins of modern human populations. Ann Hum Genet. 65: 43
62.
Valdes AM, Saltkin M and Freimer NB. (1993). Allele frequency at microsatellite loci:
the stepwise mutation model revisited. Genetics. 133:737-749.
Verkerk AJMH, Pieretti M, Sutcliffe JS, Fu Y-H, Kuhl DPA, Pizzuti A, Reiner O,
Richards S, Victoria MF, Zhang F, Eussen BE, van Ommen G-JB, Blonden LAJ,
Riggins GJ, Chastain JL, Kunst CB, Galjaard H, Caskey CT, Nelson DL, Oostra BA
and Warren S.(1991). Identification of the gene (FMR-1) containing CGG repeat
coincident with a brekpoint cluster region exhibiting length variation in fragile X
syndrome. Cell. 65:905-914.
Walter RC, Buffler RT, Bruggemann JH, Guillaume MM, Berhe SM, Negassi B,
Libsekal Y, Cheng H, Edwards RL, von Cosel R, Nraudeau D and Gagnon
M.(2000). Early human occupation of Red sea coast of Eritrea during the last inter
giacial. Nature. 405:65-69.
Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins
N, Winchester E, Spencer J, Kruglyak L, Stein L, Linda H, Topaloglou T, Hubbell E,
Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C,
Rozen S, Hudson TJ, Lipshutz R, Chee M and Lander ES.(1998). Large-Scale
Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the
Human Genome. Science. 280:1077-1082.
Watkins WS, Ricker CE, Bamshad MJ, Carroll ML, Nguyen SV, Batzer MA,
Harpending HC, Rogers AR, Jorde LB.(2001). Patterns of ancestral human diversity:
an analysis of Alu insertion and restriction-site polymorphisms. Am. J. Hum Genet.
68:738-752.
143
Webster MT, Smith NG, Ellegren H. (2002). Microsatellite evolution inferred from
human-chimpanzee genomic sequence alignments. Proc Natl Acad Sci USA
99:8748-8753.
Wolpert S.(2000). A new history of India. Oxford University Press, New York.
144
Zhang F, Su B, Zhang YP and Jin L. (2007). Genetic studies of human diversity in
East Asia. Phil. Trans. R. Soc. B 362: 987995.
145
APPENDIX
-8-
Appendix I: List of Y-SNPs analyzed along with their primer sequences and PCR amplification conditions used in this study.
1 Apt AFLP TEK E TGG ATT GCA TTC AAC TTC ACT TAC 65.5
TEK G CTG AGT TCA AAT GCT CGG GTC TC
2 LLY22g AFLP LLY22gF CCA CCCAGT TTT ATG CAT TTG 55
LLY22gR ATA GAT GGC GTC TTC ATG AGT
3 L1Y PCR L1YF GCA CAA TGT GCA CAT GTA CCC TA
L1YR TGA TGT GTG CAT TCA TCT CAT ATA T
4 M6 DHPLC M6 F CAC TAC CAC ATT TCT GGT TGG 63, 56
M6 R CGC TGA GTC CAT TCT TTG AG
5 M8 Sequencing M8 F CCC ACC CAC TTC AGT ATG AA 56
M8 R AGG CTG ACA GAC AAG TCC AC
6 M9 AFLP M9F GCA GCA TAT AAA ACT TTC AGG 55
M9R AAA ACC TAA CTT TGC TCA AGC
7 M11 AFLP M11R TTC ATC ACA AGG AGC ATA AAC AA 55
M11F CCC TCC CTC TCT CCT TGT ATT CTA CC
8 M12 ARMS PCR M12 F ACT AAA ACA CCA TTA GAA ACA AAG G 57
M12Nor R AGC AAC ATA GTG ACC CCC AAC
M12Mut R GCA ACA TAG TGA CCC CCA AA
9A M17 AFLP M17F GTG GTT GCT GGT TGT TAC GT 60
M17R AGC TGA CCA CAA ACT GAT GTA GA
9B M17 ARMS M17FN TTG CTG GTT GTT ACG GGG 60
M17FM GTTG CTG GTT GTT ACG GGT
M17R GCT ATT CTT GTT TCT CCA GGC
10 M20 AFLP M20F GAT TGG GTG TCT TCA GTG CT 60
M20R CAC ACA ACA AGG CAC CAT C 58
11 M25 DHPLC M25 F AAA GCG AGA GAT TCA ATC CAG 63, 56
M25R TTT TAG CAA GTT AAG TCA CCA GC
12 M27 ARMS-PCR M27 F CGG AAG TCA AAG TTA TAG TTA CTG G 65
M27RNL TAT AGG AAT CGA GGT TCA GGT CAG
M27 RMT TAT AGG AAT CGA GGT TCA GGT CAC
a
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
13 M31 DHPLC M31 F GAA CC AGA CAA TAC GAA ATA GAA G 63, 56
M31 R TTT AGC GGC TTA TCT CAT TAC C
14 M32 DHPLC M32 F TTG AAA AAA TAC AGT GGA AC 63, 56
M32 R CAA GTG TTT AAG GAT ACA GA
15 M35 ARMS-PCR M35 FN ATT TTC CTT TGG GAC ACT AG 58
M35 FM ATT TTC CTT TGG GAC ACT AC
M35 R AGA GGG AGC AAT GAG GAC A
16 M36 DHPLC M36 F AGA TCA TCC CAA AAC AAT CAT AA 63, 56
M36 R AAG GCT GAA ATC AAT CCA ATC TG
17 M38 Sequencing M38 F CAG TTT TTA GAG AAT AAT GTC CT 63, 56
M38 R TTA AAG AAA AGA AAA GCA GAT G
18 M45 DHPLC M45F GCT GGC AAG ACA CTT CTG AG 63, 56
M45R AAT ATG TTC CTG ACA CCT TCC
19 M48 ARMS-PCR M48 FN TGA CAA TTA GGA TTA AGA ATA TTA TA
M48 FM TGA CAA TTA GGA TTA AGA ATA TTA TG
M48R AAA ATT CCA AGT TTC AGT GTC ACA TA
20 M50 DHPLC M50 F CGG CAA CAG TGA GGA CAG T 63, 56
M50 R TGC TTC AGG AGA TAG AGG CTC
21 M52 ARMS-PCR M52FC TAT CGG CCT CCT GAG TAC CTG 60
M52RG CAA GAA ACC TAT CAA ACA TCC G
M52FM CAA GAA ACC TAT CAA ACA TCC TC
22 M56 ARMS PCR M56R TCT CAT TGC TGC CTC TCT TTA 55
M56FNL GCA ATG GGA GGA TTA CGA CA
M56FMT GCA ATG GGA GGA TTA CGA CT
23 M60 DHPLC M60 F GCA CTG GCG TTC ATC ATC T 63, 56
M60 R ATG TTC ATT ATG GTT CAG GAG G
24 M62 ARMS-PCR M62 FNL GGA ATT AAT TAT TTC TCT TTC TCA T 54
M62 FMT GGA ATT AAT TAT TTC TCT TTC TCA C
M62 R TGG TGG CAT GTG CCT GTG TT
25 M67 ARMS-PCR M67 F CCA TAT TCT TTA TAC TTT CTA CCT 55
M67 RNL TCG TGG ACC CCT CTA TAC A
M67 RMT TCG TGG ACC CCT CTA TAC T
b
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
26 M69 DHPLC M69 F GGT TAT CAT AGC CCA CTA TAC TTT G 63, 56
M69 R ATC TTT ATT CCC TTT GTC TTG CT
27 M70 ARMS-PCR M70 FNL GGA CTC ATG TCT CCA TGA GTA 58
M70 FMT GGA CTC ATG TCT CCA TGA GTC
M70 R ATC TTT ATT CCC TTT GTC TTG CT
28 M73 DHPLC M73 F CAG AAT AAT AGG AGA ATT TTT GGT 63, 56
M73 R ATT TTC CTT ATT TTC TAA GCA GC
29 M74 DHPLC M174 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M174 R AAT TCA GCT TTT ACC ACT TCT GAA
30 M76 DHPLC M76 F TAG AAG TAG CAG ATT GGG AGA GG 63, 56
M76 R CCT GAT AAA ATG AAA AAA ATG GTC
31 M78 ARMS-PCR M78 F TGG TTC TCC ACT ACA GGA GA 61
M78 RN ATT TTG AAA TAT TTG GAA GGG TG
M78RM TAT TTT GAA ATA TTT GGA AGG GTA
32 M82 DHPLC M82 F CTG TAC TCC TGG GTA GCC TGT 63, 56
M82 R AAG AAC GAT TGA ACA CAC TAA CTC
33 M87 DHPLC M87 F TCC CAT TAT TTG CTA TAT TTG CT 55
M87 RNL AAC AAG CTG GCA TCA GAA TAT AA
M87RMT CAA GCT GGC ATC AGA ATA TAG
34 M88 Sequencing M88 F ATT CTA GGG TCA GGC AAC TAG G 63, 56
M88 R TGT TTG TTC TAT TCT ATG GTC TTC C
35 M89 ARMS-PCR M89 F AGA AGC AGA TTG ATG TCC CAC T 62
M89 RNL AAC TCA GGC AAA GTG AGA GAA G
M89 RMT AAC TCA GGC AAA GTG AGA GAA A
36 M91 DHPLC M91F GAG CTT GGA CTT TAG GAC GG 63, 56
M91R AAA CTT TAA GGC ACT TCT GGC
37 M92 ARMS-PCR M92 F GGC CTT ATA AGA TTG GCA TAC 62
M92 RNL CTA AAT ACT GTT GGA GCC TAT A
M92 RMT CTA AAT ACT GTT GGA GCC TAT G
38 M97 DHPLC M97 F GTT GCC CTC TCA CAG AGC AC 63, 56
M97R AAG GTC ACT GGA AGG ATT GC
39 M101 DHPLC M101 F TCA CAG CAG CTT CAG CAA A 63, 56
c
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
M101 R ATA AAA ATT AGA CTC TGT GTT ACT AGC
40 M103 DHPLC M103 F CAG TAA GTG AAC TCA CAC ATA ATT CC 63, 56
M103 R CCA GTT TTA TTT CAG TTT CAC AGC
41 M109 DHPLC M109 F GGG TAT CAA AAT GTC TTC AAC CT 63, 56
M109 R GGG AAT TTC CTG CTA CTT GC
42 M110 Sequencing M110F CAG GGA AGG ACC GTA AAA GG 63, 56
M110 R ATG TTT ATC ATG TGC AGT AAA GGT T
43 M111 Sequencing M111 F AAT CTT CTG CAA AGG GTT CC 63, 56
M111 R CAG CTA CAA AAC AAA ATA CTG GAC
44 M117 DHPLC M117 F AAG TAT GAC TTA TGA AGT ACG AAG AAA 63, 56
M117 R ATT CAG TTA GAT TTT ACA ATG AGC A
45 M119 DHPLC M119 F GAA TGC TTA TGA ATT TCC CAG A 63, 56
M119 R TTC ACA CAA TAT ACA AGA TGT ATT CTT
46 M122 ARMS-PCR M122FN AAT TGA GAT ACT AAT TCA T 50
M122FM AAT TGA GAT ACT AAT TCA C
M122R AAA ACT TTA TCA TAT TGA G
47 M123 ARMS-PCR M123 F CAG CGA ATT AGA TTT TCT TGC 58
M123RN GTA TCT GAA CTA GCA TAT CTG
M123RM AGT ATC TGA ACT AGC ATA TCT A
48 M124 ARMS-PCR M124 F TGC CTT TTG GAA ATG AAT AAA TC 60
M124 N ACA AAC TCA GTA TTA TTA AAC CG
M124 R ACA AAC TCA GTA TTA TTA AAC CA
49 M133 DHPLC M133 F TGA AAT GGA AAT CAA TAA ACT CAG T 63, 56
M133 R CCT TTT CTT TTT CTT TAA CCC TTC
50 M134 DHPLC M134 F AGA ATC ATC AAA CCC AGA AGG 63, 56
M134 R TCT TTG GCT TCT CTT TGA ACA G
51 M136 DHPLC M136 F ATG TGA AGA CAA CAC TGT GTG G 63, 56
M136 R TTG TGG TAG TCT TAG TTC TCA TGG
52 M143 DHPLC M143 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M143 R AAT TCA GCT TTT ACC ACT TCT GAA
53 M147 Sequencing M147 F GTA TTC TGG GGC AAT TTT AGG 94-63-56-72 94-56-72
M147 R TTG ATA CAA GAG GTT ATT TTA AGC A 0.5Cdec/cycle
d
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
54 M148 DHPLC M148 F AAC AGA ATT ATC AGG AAA AGG TTT 63, 56
M148 R TTT TAC TTG TTC GTG TAC TTT CAA
55 M150 DHPLC M150 F GCA GTG GAG ATG AAG TGAG AC 63, 56
M150 R CCT ACT TTC CCC CTC TTC TG
56 M152 DHPLC M152 F AAG CTA TTT TGG TTT CTT TCA 63, 56
M152 R GCC TTG TGT GGG TAT GAT TG
57 M157 DHPLC M157F GCT GGC AAG ACA CTT CTG A 55
M157RNL ACC AAA GGT CAT TTG TGG AT
M157RMT CCA AAG GTC ATT TGT GGA G
58 M170 ARMS-PCR M170 N TAT TTA CTT AAA AAT CAT TGT TCA 56
M170FCmutant TAT TTA CTT AAA AAT CAT TGT TCC
M170 Rnormal CTT TTT TCA GTT CTT CAT CAG TTA
59 M172 ARMS-PCR M172 FNL CCC AAA CCC ATT TTG ATG CTA T 61
M172 FMT CCC AAA CCC ATT TTG ATG CTA G
M172 R TCA CAG TGG ATC CAT CTT CAC T
60 M173 ARMS-PCR M173 N AAT TCA AGG GCA TTT AGA ACA
M173 FC AAT TCA AGG GCA TTT AGA ACC 56
M173R TAT CTG GCA TCC GTT AGA AAA G 55
61 M175 Sequencing M175 F TTG AGC AAG AAA AAT AGT ACC CA 94-63-56-72 94-56-72
M175 R CTC CAT TCT TAA CTA TCT CAG GGA 0.5Cdec/cycle
62 M177 Sequencing M177 F TTT AAC ATT GAC AGG ACC AG 94-63-56-72 94-56-72
M177 R GTG TTG GTT CTC CTG TAA AG 0.5Cdec/cycle
63 M185 DHPLC M185 F GGA GTA CCT ATC ACT GAA TGT GC 63, 56
M185 R GTC ATT CAT TTC TGC TTG GAA C
64 M193 DHPLC M193 F GCC TGG ATG AGG AAG TGA G 63, 56
M193 R GCC TTC TCC ATT TTT GAC CT
65 M201 ARMS PCR M201 FN AAT AAT CCA GTA TCA ACT GAG AG 56
M201 FM TAA TAA TCC AGT ATC AAC TGA GAT
M201 R GTT CTG AAT GAA AGT TCA AAC GT
66 M207 ARMS-PCR M207 FN TAA GTC AAG CAA GAA ATT TTA 56
M207 FD TAA GTC AAG CAA GAA ATT TTG 52
M207 R CAA AAT TCA CCA AGA ATC CTT G
e
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
67 M214 ARMS-PCR M214 F CAA GCG TAG AGG TAT TAC TAC AA 66
M214RNL TGA GAC ACT GTC TGA AAA CAA TA
M214 RMT TGA GAC ACT GTC TGA AAA CAA TG
68 M217 Sequencing M217 F GCT TAT TTT TAG TCT CTC TTC CAT 63, 56
M217 R ACC TGT TGA ATG TTA CAT TTC TTT
69 M218 DHPLC M218 F TTG TGA GTT TTT TTC CAT CAA TC 63, 56
M218 R TTT ATT GAC GAT GGT ATT AGA AGA G
70 M231 DHPLC M231F CCT ATT ATC CTG GAA AAT GTG G 63, 56
M231R ATT CCG ATT CCT AGT CAC TTG G
71 M242 ARMS-PCR M242 F AAC TCT TGA TAA ACC GTG CTG 61
M242 RNL CAC GTT AAG ACC AAT GCC ATG
M242 RMT CAC GTT AAG ACC AAT GCC ATA
72 M267 ARMS-PCR M267 F TTA TCC TGA GCC GTT GTC C
M267 RNL CCA CAC AAA ATA CTG AAC GAT 62
M267 RMT CCA CAC AAA ATA CTG AAC GAC 58
73 M317 DHPLC M317 F TGG TTC TAC AGT TGG GAT TTT G 63, 56
M317 R CCT TAA TAA CCG AGG CAC AA
74 M343 ARMS M343 F TTT AAC CTC CTC CAG CTC TG
M343RNL CCA CAT ATC TCC AGG TCT AG
M343RMT CCA CAT ATC TCC AGG TCT AT
75 M349 ARMS M349 F TGG GAT TAA AGG TGC TCA TG 58
M349RN CCT AAG GTC AGA AAG TTT TAA C
M349 RM CCT AAG GTC AGA AAG TTT TAA A
76 M357 DHPLC M357 F CCC CGT TTT TTC CTC TCT GCC 63, 56
M357 R CAC GTA ACC TGG GAT GGT CAT A
77 P15 DHPLC P15F AGA GAG TTT TCT AAC AGG GCG 63, 56
P15R TGG GAA TCA CTT TTG CAA CT
78 P31 Sequencing P31 F TAA GGC TGC GTG TTC CCT AT 63, 56
P31 R GCA CTG TCA CTG TGG ATG TT
79 PK1 AFLP PK1 F TCA ACT TTC TTA AAT GAT TGT ACG TT
PK1 R TCT GTT CAG GAG AAC CTC TAT GG
80 PK2 ARMS-PCR PK2 F TGT GTC CTG GTG TCT TTT GG 67
f
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC
PK2 RN GGT GTA CAA AAT AGT TTT TGT TTT TGA TCT AA
PK2 RM GGT GTA CAA AAT AGT TTT TGT TTTT GAT CTC G
81 PK3 ARMS-PCR PK3 F TGT GTC CTG GTG TCT TTT GG 68
PK3 N AAA GCC ACC ATC TCA AGA TGG TGT ACT A
PK3 M AAA GCC ACC ATC TCA AGA TGG TGT ACT G
82 PK4 DHPLC PK4 F CCA TCC TCC CAT GGC TAG T 63, 56
PK4 R GCT TCC AAG GTG CCC TTT AT
83 PK5 AFLP PK5 F TTC CAA ACA CAT GCT TCT GC 58.5
PK5 R TAA AAA GGA GGA GGG ACT GC
84 RPS4Y AFLP RPS4Y L CCA CAG AGA TGG TGT GGG TA 61
RPS4Y R GAG TGG GAG GGA CTG TGA GA
85 SRY+465 AFLP SRY13 GCC GAA GAA TTG CAG TTT 58
SRY14 GTT GAT GGG CGG TAA GTG GC
86 SRY1532 AFLP SRY1 TCC TTA GCAACC ATT AAT CTG G 60
SRY2 AAA TAGCAAAAA ATG ACA CAA GGC
87 SRY2627 AFLP SRY-2627 F CGC GGC TTT GAA TTT CAA GCT CTG 63
SRY-2627 R TAA GAG TCC CTC GGG GCC CTG G
88 SRY8299 AFLP SRY8299 R ACA GCA CAT TAG CTG GTA TGA C
SRY8299 F TCT CTT TAT GGC AAG ACT TAC G
89 sY81 AFLP SY810.1 AGG CAC TGG TCA GAA TGA AG 56
SY810.2 AAT GGA AAA TAC AGC TCC CC
90 TAT AFLP TAT 1 GAC TCT GAG TGT AGA CTT GTG A 60
TAT 3 GAA GGT GCC GTA AAA GTG TGA A
91 YAP PCR YAP 1 CAG GGG AAG ATA AAG AAA TA 59
YAP 2 ACT GCT AAA AGG GGA TGG AT
92 12f2 PCR 12F2 F TCT TCT AGA ATT TCT TCA CAG AAT TG 59
12F2 D CTG ACT GAT CAA AAT GCT TAC AGA TC
93 92R7 AFLP 92R7 L GCC TAT CTA CTT CAG TGA TTT CT 62
92R7 L (R ) GAC CCG CTG TAG ACC TGA CT
92R7 A TGC ATG AAC ACA AAA GAC GTA 65
92R7 B GCA TTG TTA AAT ATG ACC AGC
g
M320
T2
USP9Y+3178=M184, M70, M193,M272 P77
T1
T*
M226
S1d
OCEANIA& INDONESIA
P83
S1c
P61
M254 S1b
P57
S1a
M230, P202, P204
S1*
S**
S
M124, P249, P267
R2
R 2
M335
R1b1c
M160
R1b1b2h2
U152 M126
R1b1b2h1
R1b1b2h*
P107
R1b1b2g2
U106 U198
R1b1b2g1
R1b1b2g*
P66
R1b1b2f
M222=USP9Y+3636
R1b1b2e
SRY2627 (M167)
R1b1b2d
M153
R1b1b2c
M65 R1b1b2b
M269
M37
EURASIA
R1b1b2a
P297
R1b1b2*
P25
M373
R1b1b1a
M73
R1b1b1*
R1b1b*
M343 M18
R1b1a
R1b1*
M173=P241, P231, P233, P234, P236, P238, P242 R1b*
P286, P294 M434
R1a1f
M207, M306, P224, P227, P229, P232, P280, P285 Pk5
R1a1e
IX
P98
R1a1d
M64.2, M87, M204
R1a1c
M17M198
M157
R1a1b
SRY10831.2 M56
R1a1a
R1a1*
R1a*
R1*
R*
M378
Q1b
M323
Q1a6
P89
Q1a5
P48
Q1a4
M199, P106, P292
Q1a3a3
M194
M3 Q1a3a2
Q1a3a 2
M19
Q1a3a1
a*
Q1a3a*
Q1a3
P27, 92R7, M45, M74 M242 P36.2 MEH2 M346
X
AMERICA
Q1a3*
M25, M143
Q1a2
M120, N14
Q1a1
Q1a*
Q1*
Q*
P
M333
O3a6
M300
O3a5
P103
O3a4a
(002611)
O3a4*
P101
O3a3c12
M162
O3a3c1a
M134 M117, M133
O3a3c1*
O3a3c*
P164
O3a3b2
P201=(021354) N5
O3a3b1b
M7 M113, M188, M209 N4
O3a3b1a
O3a3b1*
O3a3b*
M159
O3a3a
O3a3*
M324, P93, P197, P199 M164
O3a2
P200 M121, P27.2
M122, P198 O3a1
O3a*
O3*
47Z
AUSTRALASIA
O2b1
SRY465, P49, M176
(022454) O2b*
Pk4
M88, M111 O2a1a
M95 O2a1*
P31, M266
O2a*
M175, P186, P191, P196 O2*
M50, M103, M110
O1a2
M101
O1a1a
M119 P203
O1a1*
MSY2.2
O1a*
O1*
O*
VII
P119 N1c1c
P67
N1c1b
M178
P21
N1c1a
TAT (M46),P105
N1c1
N1c*
LLY22g
P63
P43 N1b1
EUROPE
N1b*
M231
M128
N1a
N1*
N*
P117, P118
M3
SRY9138=M177
M353, M387 M2a
M2
M83 M1b1b
P22=M104 M16
M1b1a
P256 P87 NEW GUINEA
M1b1*
M1b*
P94
M4, M5=P73, M106, M186, M189, M296, P35 M1a2
P51
P34 M1a1
M1a*
M1*
M*
M*
Pk3
L3a
P14, M89, M213
M357
L3*
M274
L2b
M11, M20, M22, M61, M185, M295 M317 M349
L2a
INDUSVALLEY
VIII
L2*
M27, M76
L1
L*
M177 P261 P263 K4
P79
P7 9 K3
P6
P600
K2
SRY M147
9138
K1
ASIA
K
P84
J2b2d
M321
J2b2c
M280
M241 J2b2b
M12, M102, M221, M314
M99
J2b2a
J2b2*
M205
J2b1
J2b*
P279 J2a13
P81
J2a12
M419 J2a11
M340 J2a10
M339 J2a9
M68 J2a3
M163, M166 J2a2b
M327
M92, M260 J2a2a1
M67
J2a2a*
J2a2*
M47, M322
12f2a, M304,P209 J2a1
J2a*
J2**
J2
M369
J1e2
P58 M367, M368
J1e1
J1e*
P56
J1d
J1 d
M390
J1c
J1 c
M267 M365
J1b
J1b
M62
J1a
J1*
J*
P95 I2b4
I2b4
P78 I2b3
M223, P214, P216, P217, P218, P219, P220, P221, P222=U250, P223 M379 I2b2
M284 I2b1
NORTH EUROPE
VI
P215 I2b*
M161
M26 I2a2a
P41.2=M359 I2a2*
P37.2
I2a1
I2a*
P259
I1d
P109
P19,M170, P38, M258, P212, U179 I1c
M253, M307, P30, P40, M450 M72
M227 I1b1
I1b*
M21
I1a
I1*
I*
I*
P266 H2b
APT P80 H2a
M370 H2*
H1b
M39,M138 H1a3
M97
M69 M52 M82 H1a2
M36, M197
INDIA
H1a1
H1a*
H1*
H
M283
G3
M377
G2c
G2 c
M287
G2b
G2 b
M286
G2a2
P17, P18 G2a1a
P287 P15 P16
1*
G2a1*
G2a
G2a*
G2*
G2 *
EURASIA
M201,P257 P76
G1b
P20
M285, M342 G1a
G1*
G*
M427, M428
F2
F 2
M282
F1
F1
F
P9, M168, M294 P258
E2b1a2
M200 P45
E2b1a1
M85 E2b1a*
M54, M90, M98
E2b1*
M75, P68 E 2b*
E2b*
M41 2a
E2a
E
2*
E2*
E
P75
E1b2
M329
E1b1c
P72
E1b1b1f
V6
E1b1b1e
M281
E1b1b1d
M290
E1b1b1c1b
M34 M84, M136
E1b1b1c1a
M123
E1b1b1c1*
E1b1b1c*
M165, M183 E1b1b1b2
M81 M107
E1b1b1b1
E1b1b1b*
V65
E1b1b1a4
V19
M35 E1b1b1a3b
V22 M148 E1b1b1a3a
E1b1b1a3*
P65 E1b1b1a2b
V13, V36 V27
E1b1b1a2a
E1b1b1a2*
M78 V32 E1b1b1a1b
V12 M224 E1b1b1a1a
E1b1b1a1*
M215
E1b1b1a*
III
E1b1b1*
E1b1b*
P268, P269
E1b1a9
P59
E1b1a8a12
U181
IV
JAPAN
D2a1a*
(021355)
M55, M57, M64.1, M179, P37.1, P41.1, P190, 12f2b D2a1*
D2a*
D2*
N2
N1 D1a1
M15 D1a*
D1*
D*
M401
P55
C6
P92 C5a
M356
C5*
M210 C4a
M347
C4*
P62
C3e
P53.1 C3d
M48, M77, M86 C3c
M39 C3b
V
M407
M217, Pk2, P44 C3a2
M93
II
M182 b1
B2b1
B2
b*
B2b*
B2
M108.1 P111, M43 B2a2
B2a2aa
B2a2*
M150 M109, M152, P32, P50 B2a1a
M218
M60, M181, P85, P90
B 2a1*
B2a*
B 2*
M146 B1a
M236, M288
AFRICA
B1*
M118
A3b2b
M13, M63, M127, M202, M219,M305 M171
A3b2a
A3b2*
M144, M190, M220, P289 P71, P102
A3b1a
M51, P100, P291
A3b1
M32
A3b*
M28, M59
A3a
P262 A2c
I
P28
M6, M14, M23, M49, M71, M135, M141, M196, M206, M212, MEH1, P3, P4, P5, P36.1, Pk1, P247, P248 A2b
M114
M91 P97 A2a
A2*
P114
A1b
P108 M31, P82
A1a
A1*