Felkl Et Al, 2023. Ancestry Resolution of South Brazilians by Forensic 165 Ancestry-Informative SNPs Panel
Felkl Et Al, 2023. Ancestry Resolution of South Brazilians by Forensic 165 Ancestry-Informative SNPs Panel
Felkl Et Al, 2023. Ancestry Resolution of South Brazilians by Forensic 165 Ancestry-Informative SNPs Panel
A R T I C L E I N F O A B S T R A C T
Keywords: Forensic DNA phenotyping (FDP) includes biogeographic ancestry (BGA) inference and externally visible char
Brazilian population acteristics (EVCs) prediction directly from an evidential DNA sample as alternatives to provide valuable intel
Population genetics ligence when conventional DNA profiling fails to achieve identification. In this context, the application of
Biogeographic ancestry
Massively Parallel Sequencing (MPS) methodologies, which enables simultaneous typing of multiple samples and
Massively parallel sequencing
Precision ID Ancestry Panel
hundreds of forensic markers, has been gradually implemented in forensic genetic casework. The Precision ID
Ancestry Panel (Thermo Fisher Scientific, Waltham, USA) is a forensic multiplex assay consisting of 165 auto
somal SNPs designed to provide biogeographic ancestry information. In this work, a sample of 250 individuals
from Rio Grande do Sul (RS) State, southern Brazil, apportioned into four main population groups (African-,
European-, Amerindian-, and Admixed-derived Gauchos), was evaluated with this panel, to assess the feasibility
of this approach in a highly heterogeneous population. Forensic descriptive parameters estimated for each
population group revealed that this panel has enough polymorphic and informative SNPs to be used as a sup
plementary instrument in forensic individual identification and kinship testing regardless of ethnicity. No sta
tistically significant deviation from Hardy-Weinberg equilibrium was observed after Bonferroni correction.
However, seven loci pairs displayed linkage disequilibrium in pairwise LD testing (p < 3.70 × 10− 6). Inter
population comparisons by FST analysis, MDS plot, and STRUCTURE analysis among the four RS population
groups apart and along with 89 reference worldwide populations demonstrated that Admixed- and African-
derived Gauchos present the highest levels of admixture and population stratification, whereas European- and
Amerindian-derived exhibit a more homogeneous genetic conformation.
1. Introduction human groups, which are easily perceivable and recognized as charac
teristic of such groups. As an example, pigmentation traits are one of the
Forensic DNA phenotyping (FDP) includes biogeographic ancestry most distinguishing of these physical appearance elements. Different
(BGA) inference and externally visible characteristics (EVCs) prediction aspects of phenotypic expression can, therefore, be correlated with the
directly from an evidential DNA sample as alternatives to provide different levels of genetic structure observed in human populations, and
valuable intelligence when conventional DNA profiling fails to achieve as such have been widely explored and investigated through techniques
an identification [1]. FDP may reduce the pool of potential suspects and based on ancestry-informative markers (AIMs), mainly autosomal single
hence guide investigations to find previously unknown perpetrators, as nucleotide polymorphisms (SNPs) [3]. AIMs present marked allele fre
well as helping identify missing persons or mass disaster victims [2]. The quency divergences among populations from different geographic re
indirect method of evaluating physical appearance provided by BGA is gions and are useful for determining an individual’s likely
based on a set of distinctive, particular features presented by some biogeographic ancestry or population of origin.
* Correspondence to: Laboratório de Genética Forense, Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, Av. Ipiranga,
6681, Prédio 12C, Sala 233, Porto Alegre, RS 90619-900, Brazil.
E-mail address: [email protected] (A.B. Felkl).
https://doi.org/10.1016/j.fsigen.2023.102838
Received 24 August 2022; Received in revised form 15 January 2023; Accepted 22 January 2023
Available online 23 January 2023
1872-4973/© 2023 Elsevier B.V. All rights reserved.
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
Forensic DNA analysis is often confronted with highly degraded and ethical principles stated in World Medical Association’s Helsinki
contaminated samples, requirements for high precision and reproduc Declaration [16] and was approved by the National Research Ethics
ibility, besides time and cost considerations. In this sense, the advent of Committee of CEP/Conep system via Plataforma Brasil, under CAAE
Massively Parallel Sequencing (MPS) techniques – used for simultaneous number 15620919.3.0000.5336.
typing of a large number of targeted markers, with high throughput and
consequently reduced analysis time – had a hugely positive effect on 2.2. Samples, DNA extraction, and quantification
forensic sciences [4,5]. Soon after, commercial SNP-Panel-based kits for
sequencing on high-throughput platforms were introduced to the Oral swabs were obtained from 250 unrelated voluntary donors in
forensic community. Precision ID Ancestry Panel (formerly HID Ion the metropolitan region of Porto Alegre, Rio Grande do Sul (RS) State,
AmpliSeq™ Ancestry Panel) comprises a set of 165 autosomal southern Brazil. The population sample comprises 130 women and 120
ancestry-informative SNPs (AISNPs) previously selected by two labora men, with ages ranging from 18 to 75 years. Subjects provided pheno
tories [6,7] and commercially available by Thermo Fisher Scientific typic, ethnic, and ancestry information in a self-evaluation form and
(TFS; Waltham, MA, USA) for BGA inference. The average amplicon agreed to the photographic registry. Based on self-declared data and
length is 120–130 bp, projected to successfully allow processing of hetero-attribution by multivariate phenotypic evaluation (including
highly degraded, low input, and other forensic challenging samples. eye, skin and hair color, and hair and facial morphology), volunteers
The Brazilian population is a multicultural and multiethnic nation were apportioned into four categories: European-derived Gauchos
with a complex demographic history, characterized by intense and (EURS, n = 92), African-derived Gauchos (AFRS, n = 62), Amerindian-
heterogeneous admixing processes that encompass three large conti derived Gauchos (AMRS, n = 22, obtained from direct descendants of
nental groups – Native Americans (NAM), European (EUR) settlers, and Guarani and Kaingang population groups from RS), and Admixed-
enslaved Sub-Saharan Africans (AFR) [8,9]. The influx of European derived Gauchos (ADRS, n = 74, characterized by an admixture of two
settlers at the end of the 15th century, mostly coming from the Iberian or three parental populations declared by family history and verified by
Peninsula, culminated in both asymmetric mating with Amerindian phenotype evaluation).
women and a drastic reduction of the native people due to diseases and Genomic DNA from buccal swabs was extracted with a standard
conflicts. Soon after, a large contingent of Africans, mostly from Western phenol-chloroform-isoamyl alcohol protocol. Extracted DNA was
African territory (Senegal, Gambia, and Guinea-Bissau), was forcedly quantified using Qubit™ 2.0 Fluorometer with Qubit™ dsDNA High
brought to Brazil as slaves. In the following two centuries, Africans were Sensitivity (HS) Assay Kit (TFS; Waltham, MA, USA) according to the
brought from Angola and Congo; and in the 19th century, the predom manufacturer’s recommendations.
inant component was from Mozambique [10]. Finally, late migratory
movements occurred in the 19th and 20th centuries, with the arrival of 2.3. Library preparation, quantification, and sequencing – Precision ID
Europeans (predominantly Germans, Italians, Portuguese, and Span Ancestry Panel
iards) and Asian migrants (essentially from Japan and Middle East
countries). These peoples met and mated among themselves in different Library prep of 132 samples was performed using Ion AmpliSeq™
ways, giving rise to a highly admixed multiethnic population [11]. Library Kit 2.0 (TFS; Waltham, MA, USA) combined with HID-Ion
Brazilian territorial occupation followed variable patterns of multi AmpliSeq™ Ancestry Panel (TFS; Waltham, MA, USA). Genomic DNA
directional introgression according to social and historical conditions targets were amplified in a final reaction volume of 20 μL containing 1
and significantly vary for each distinct geographical region [12]. Het μL of template DNA (1 ng), 4 μL of 5x Ion AmpliSeq™ Hi-Fi Mix, 10 μL of
erogeneous processes of migratory flows led to marked divergences in 2x Ion AmpliSeq™ primer pool (Ancestry Panel), and 5 μL of nuclease-
regional ethnical composition, and distinctive proportions of parental free water. PCR reaction was performed in a Veriti 96-well Thermal
populations (NAM, EUR, and AFR) contribution in present-day geopo Cycler (TFS; Waltham, MA, USA), under following conditions: enzyme
litical regions are noticeable [13]. Rio Grande do Sul (RS) is the activation at 99 ◦ C for 2 min, 21 cycles at 99 ◦ C for 15 s and at 60 ◦ C for 4
southernmost State of Brazil, with a current estimate of approximately min, and holding at 10 ◦ C. PCR amplicons were partially digested with 2
11 million inhabitants. The history of RS is peculiar since its effective μL FuPa reagent and incubated at 50 ◦ C for 10 min, 55 ◦ C for 10 min,
colonization started in the 18th century only. At the time first Europeans 60 ◦ C for 20 min, and held at 10 ◦ C for up to 1 h. Adapters ligation was
arrived, the region was inhabited by Native Americans identified basi performed by adding to the 22 μL of digested amplicon: 4 μL of Switch
cally with three major groups: (1) Guarani; (2) Kaingang; and (3) Solution, 0,5 μL of Ion P1 Adapter, 0,5 μL of Barcode X (X was chosen
Pampean tribes [14]. African contingent established in south Brazil from Ion Xpress™ Barcode Adapters 1–96 Kit or IonCode™ Barcode
seems to have come mostly from South and East African coasts (current Adapters 1–384 Kit for different samples), 1 μL of nuclease-free water, 2
Angola and Mozambique), as well as from the West-Central African re μL of DNA ligase and incubated at 22 ◦ C for 30 min, 72 ◦ C for 10 min,
gion [15]. From the 19th century onwards, large inflows of Germans and and held at 10 ◦ C for up to 1 h. After barcode adapters ligation, libraries
Italians gradually transformed the RS profile, shaping its population were purified with 45 μL of 1.5x Agencourt® AMPure® XP Reagent
with one of the highest European ethnic composition of the country. (Beckman Coulter, FL, USA) and washed two times using freshly pre
The present study characterizes the 165 SNPs included in the Pre pared 70% ethanol (EtOH), according to manufacturer’s instructions.
cision ID Ancestry Panel (TFS; Waltham, MA, USA) in four main RS State To assess yield and subsequent normalization, diluted libraries (9 μL
(southern Brazil) population groups (also termed “Gauchos”). We at 1:100 dilution) were quantified using a 7500 Real-Time PCR System
analyzed forensic parameters and conducted population structure ana (TFS; Waltham, MA, USA) with Ion Library TaqMan™ Quantitation Kit
lyses among the four population groups apart and along with 89 refer (TFS; Waltham, MA, USA). Then multiple libraries diluted to 20 pM were
ence worldwide populations, aiming to scrutinize genetic diversity, pooled in equivolume for template preparation.
similarity levels, ancestry inference, and population stratification of A 25 μL sample of the pooled library was added to the amplification
investigated population groups. solution to originate template-positive Ion Sphere Particles (ISPs).
Emulsion-based clonal amplification (emPCR) was performed on Ion
2. Materials and methods OneTouch™ 2 Instrument (TFS; Waltham, MA, USA) with Ion PGM™
Hi-Q™ View OT2 Kit (TFS; Waltham, MA, USA). Template-positive ISPs
2.1. Ethical Statement were enriched on Ion OneTouch™ Enrichment System (TFS; Waltham,
MA, USA). Both emPCR and enrichment were conducted following the
All samples analyzed in this study were obtained from voluntary manufacturer’s protocol (Revision A.0) [17].
donors following informed consent. This work is in accordance with Controls and sequencing primers were added to enriched, template-
2
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
positive ISPs. Sequencing was run on Ion Torrent™ PGM™ Instrument kits [21,22]. All genotypes and base calls were manually checked by at
(TFS; Waltham, MA, USA) using an Ion PGM™ Hi-Q™ Sequencing Kit least two independent reviewers.
(TFS; Waltham, MA, USA) and an Ion 318™ Chip v2 (TFS; Waltham,
MA, USA). A final volume of 30 μL was loaded per chip, according to the 2.5.2. MiSeq® System
manufacturer’s instructions (Revision C.0) [18]. Two chips with BaseSpace™ Sequence Hub DNA Amplicon v2.0 App was used to
approximately 65 samples each were used in distinct runs for complete analyze the AmpliSeq™ Custom DNA Panel. Per-sample reads (FASTQ
sample set genotyping. files) were aligned with the BWA algorithm against the reference
genome (Homo sapiens GRCh38). Variant calling was performed by
2.4. Library preparation, quantification, and sequencing – AmpliSeq™ Pisces Variant Caller at a Depth Filter level of 10 and annotated by
Custom DNA Panel for Illumina® Illumina Annotation Engine using RefSeq transcripts. A VCF file con
taining variants of interest was uploaded to the project for SNP geno
Primers were designed by BaseSpace™ DesignStudio™ Sequencing types calling.
Assay Designer Software (Illumina, CA, USA), using AmpliSeq DNA
Hotspot and GRCh38.p2 as reference human genome, at high stringency, 2.5.3. Low-pass full genome sequencing
a maximum amplicon length of 375 bp, and 100% coverage for the same A subset of samples comprising 50 individuals was subjected to full
165 target SNPs included in the HID-Ion AmpliSeq™ Ancestry Panel. genome sequencing through an external service provider (Gencove Inc.,
Genomic DNA of 68 samples was diluted to 10 ng as standard input NY, USA). Full sequencing was attained with 1x coverage on an Illumina
recommended by the manufacturer’s protocol (Document # NextSeq 2000 equipment (Illumina, CA, USA) following library prepa
1000000036408 v08) [19]. Library preparation was performed using ration and workflow according to the company’s internal procedures,
AmpliSeq™ Library PLUS for Illumina® and AmpliSeq™ Custom DNA including sequencing protocols and data processing, as described by
Panel for Illumina®. Genomic DNA targets were also amplified in a final Wasik and collaborators [23]. Results were provided as data files with
reaction volume of 20 μL, but with 6 μL of template DNA (10 ng), 4 μL of different formats and were extracted from provided VCF files using a
5x AmpliSeq™ Hi-Fi Mix, and 10 μL of 2x AmpliSeq™ Custom DNA custom python script. In these files, genetic data is displayed as genotype
Panel. PCR reaction was also performed in a Veriti 96-well Thermal posterior probabilities, since the bioinformatics pipeline adopted by
Cycler, but under following parameters: enzyme activation at 99 ◦ C for Gencove includes an imputation step based on the model proposed by Li
2 min, 18 cycles at 99 ◦ C for 15 s and at 60 ◦ C for 8 min and holding at and Stephens [24], to predict variants located in low coverage regions or
10 ◦ C. Changes in time and cycles’ number considered the 375 bp undetected during sequencing. A threshold value of 0.98 for the geno
amplicon length. Amplicons were partially digested similarly to Preci type probability was adopted to reduce errors, and the genotype calls
sion ID Ancestry Panel’s library preparation. Indexes I7 and I5 ligation rate under the adopted threshold was less than 0.5% (evenly distributed
to each sample was conducted using Ampliseq™ CD Indexes Set A for among all 165 SNPs, with no preferential sites for unreliable calls).
Illumina®, by adding to the 22 μL of digested amplicon: 4 μL of Switch Genotype calls with reported posterior probability under 0.98 were
Solution, 2 μL of AmpliSeq CD Indexes, 2 μL of DNA Ligase, and incu assigned as missing data.
bated at 22 ◦ C for 30 min, 68 ◦ C for 5 min, 72 ◦ C for 5 min, and held at The conversion of exported SNP genotypes data to downstream
10 ◦ C for up to 24 h. Libraries were purified with 30 μL AMPure® software formats was done by PGDSpider v.2.1.1.5 [25]. Allele fre
magnetic beads and washed twice with freshly prepared 70% EtOH. A quencies of 165 SNPs and corresponding forensic statistical parameters,
second amplification step was prepared to guarantee a sufficient library including observed heterozygosity (Ho), expected heterozygosity (He),
quantity for sequencing on MiSeq® System, as follows: to each library polymorphism information content (PIC), match probability (MP),
well were added 45 μL of 1x Lib Amp Mix and 5 μL of 10x Lib Amp power of discrimination (PD), power of exclusion (PE), and typical pa
Primers and incubated at 98 ◦ C for 2 min, then 7 cycles of 98 ◦ C for 15 ternity index (TPI) were calculated using STR Analysis for Forensics
min and 64 ◦ C for 1 min, and held at 10 ◦ C. Subsequently, libraries were (STRAF) v.1.0.5 [26] online software (available at http://cmpg.unibe.ch
subjected to two purification steps using AMPure® magnetic beads and /shiny/STRAF/). Random match probability (RMP) calculations were
freshly prepared 70% EtOH. performed with validated, in-house Excel-based workbooks. Exact test of
Qubit™ 2.0 Fluorometer and Qubit™ dsDNA HS Assay Kit were used Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium test
to quantify the libraries. Next, libraries were diluted to starting con (LD) were performed using Arlequin v3.5.2.2 [27]. HWE analysis was
centration (2 nM) and pooled with 10 μL of each, afterward denatured carried out with 1000,000 Markov Chain Monte Carlo (MCMC) steps
with 0.2 N NaOH and diluted to final loading concentration of 9 pM and 1000,000 dememorization steps. Correction for multiple testing was
following manufacturer’s instructions (Document # 15039740 v10) done according to the method suggested by Bonferroni [28], by dividing
[20]. Sequencing was performed using the MiSeq® Reagent Kit v2 the significance level of 0.05 by the number of tests.
(500-cycles) on a MiSeq® System instrument (Illumina, CA, USA).
2.5.4. Data merging and population analyses
2.5. Sequencing data analysis For comprehensive analyses of populations’ genetics relationships,
our data were combined with genotypic profiles from 89 reference
2.5.1. Ion Torrent™ PGM™ worldwide populations for the 165 AISNPs included in Precision ID
Signal processing (DAT files), base calling, and unmapped and Ancestry Panel (TFS; Waltham, MA, USA). 24 populations were extrac
mapped BAM files generation (Homo sapiens hg19 as reference genome ted from the 1000 Genomes (1 kG) Project [29] Phase III and merged
to perform alignment) were conducted using Torrent Suite™ Software with previously published Basques [30] and Chinese Uyghur and Hui
v5.0 (TFS; Waltham, MA, USA). Coverage Analysis v5.0 and Torrent [31]. Danes and Somalis’ [32] genotypic data were kindly provided by
Variant Caller v5.0 plugins were used to calculate the number of mapped Professor Niels Morling and collaborators. 60 worldwide populations
reads and perform variant calling, respectively. SNP genotypes were [33] genotyped at Kidd Lab and kindly provided by Professor Kenneth K.
called under standard analysis settings by HID SNP Genotyper v4.3.2 Kidd and collaborators also compose reference populations set. Details
plugin, which allows genotypes filtering at specific locations, given in of populations used in the present study and their abbreviations are
the hotspot file (here the 165 SNPs that compose Precision ID Ancestry listed in Supplementary Table S1.
Panel). Minimum coverage was set for six reads per base position, and A population differentiation test based on pairwise FST genetic dis
heterozygote allelic call followed a maximal 70/30 unbalance rate, tances and molecular variance analysis (AMOVA) among our four
considering previous studies where the occurrence of allelic unbalance studied population groups and along with 89 worldwide populations
was observed in some genetic markers for HID Ion Ampliseq Precision were performed using Arlequin v3.5.2.2. Based on pairwise FST values,
3
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
the Multidimensional Scaling (MDS) technique was applied using IBM® seven loci pairs displayed linkage disequilibrium in pairwise LD testing,
SPSS® Statistics v25.0 [34]. Individual ancestry proportions were even after Bonferroni correction for multiple comparisons (p < 3.70 ×
evaluated using STRUCTURE v.2.3.4 [35], with ten independent runs 10− 6): three pairs in AFRS (two of them also genetically associated in
for each K value, ranging from K = 2 to K = 20. 100,000 burn-in steps ADRS), three in EURS, and an extra pair in Admixed-derived Gauchos
followed by 100,000 MCMC repetitions were applied, and ‘admixture’ (Table 1). Four out of seven pairs are located on the same chromosome,
and ‘correlated allele frequencies’ models were considered [36]. Sum up to 3.5 cM apart from each other: rs1834619–rs1876482 (Chr. 2),
mation and graphical representation of STRUCTURE results were rs260690–rs3827760 (Chr. 2), rs1426654–rs735480 (Chr. 15), and
generated using Cluster Markov Packager Across K (CLUMPAK) online rs3916235–rs4891825 (Chr. 18). The latter pair also had LD statistical
server (available at http://clumpak.tau.ac.il/) [37]. To identify the K significance in Basques [30], Danes, and Somalis [32]. Overall, full
value that captures the uppermost structure level, we used Structure recombination and independent inheritance are expected in loci with a
Harvester v.0.6.94 [38], which implements the Evanno method [39]. genetic distance of over 50 cM [50]. Nevertheless, the aforementioned
statistically associated SNPs are located at markedly shorter distances.
3. Results and discussion Besides physical linkage between loci, such non-random associations
can be caused by, among other reasons, gene flow among populations
Three distinct sequencing procedures were adopted to generate ge with dissimilar allele frequencies, population structure, and small
netic profiles of 165 AISNPs in 250 unrelated South Brazilian subjects. A sample sizes [51]. Brazilian populations display varying levels of strat
further study evaluating the comparative sequencing performance of ification and complex admixing patterns [52]; therefore, a conjunction
these methods is underway. The employed panel comprises 55 auto of the foregoing factors is presumably inducing the genetic associations
somal biallelic SNPs from AIM set developed by Kidd group [6,33] and observed between seven loci pairs in three RS population groups. LD test
123 from Seldin’s AIM set [7] (13 markers are included in both panels; p-values are detailed in Supplementary Table S8-S11. AMRS population
see Supplementary Table S2 for SNPs details) and aims to provide presented no significant association among loci after Bonferroni
biogeographic ancestry information to guide investigative processes. correction.
The commercial kit was designed to properly handle degraded DNA Observed heterozygosity (Ho) ranges from 0.048 (rs1229984 and
samples, with targeted amplicons average size less than 130 bp. Several rs4471745) to 0.661 (rs1040045) in AFRS, from 0.011 (rs3811801) to
populations have been investigated using this panel to infer genomic 0.576 (rs3784230) in EURS, from 0.045 (rs7251928 and rs7722456) to
ancestry and population stratification, including Asians (Uyghur and 0.727 (rs7745461 and rs948028) in AMRS, and from 0.095 (rs1229984)
Hui [31], Japanese [40], Chinese Tibetan-Burmese [41], Uyghur and to 0.662 (rs1871428) in ADRS, with average values of 0.361 ± 0.133,
Kazakh [42,43], and other Asian populations [44]), Europeans (Basques 0.274 ± 0.144, 0.355 ± 0.146, and 0.388 ± 0.116, respectively. As
[30], Danes [32], and Greenlanders [45]), South Americans (Ecua expected, the Admixed-derived group (ADRS) has the highest intra
dorians [46]), Middle Eastern populations (Turks and Iranians [47]) and populational genetic diversity average, followed by the African-derived
Africans (Somalis [32]). In the study herein, samples obtained from one. Noteworthy, heterozygosity values indicate greater miscegenation
individuals belonging to the three main ethnicities of Rio Grande do Sul among the South Brazilian Amerindians compared to the European-
(RS) State, southern Brazil, as well as subjects with multiethnic back derived population group. These results reflect the admixed landscape
grounds, were firstly investigated to explore genetic relationships and characterizing the Brazilian population and corroborate previous find
structures within and among them. Subsequently, population stratifi ings regarding their genetic variability in RS population and other
cation analysis and individual ancestry inference were conducted Brazilian regions [53–56]. The SNP with the highest discrimination
regarding reference populations set. power was rs3916235 (PD = 0.658; MP = 0.342) in AFRS, rs459920 (PD
= 0.661; MP = 0.339) in EURS, rs37369 (PD = 0.6653; MP = 0.3347) in
3.1. Forensic parameters of 165 SNPs for Rio Grande do Sul (Brazil) AMRS, and rs7554936 (PD = 0.6622; MP = 0.3378) in ADRS. Combined
main population groups match probability (CMP) was, in the same order as groups above, 2.45 ×
10− 51, 8.62 × 10− 40, 1.20 × 10− 48, and 8.82 × 10− 56. In African-derived
The detailed 165 AISNPs genotypes of 250 Brazilian subjects from RS
are listed in Supplementary Table S2. Observed allele frequencies and
forensic parameters estimates of these SNPs, including Ho, He, PIC, MP, Table 1
Genetically associated SNP pairs in RS State (Brazil) main population groups.
PD, PE, and TPI for individual population groups are presented in
Four out of seven pairs are located on the same chromosome (position based on
Supplementary Table S3-S6, as well as p-values for HWE tests for all loci.
hg19 genome). P-values for linkage disequilibrium tests are also provided.
Two loci (rs1800414 and rs671) are monomorphic in all four pop
ulations investigated. rs3811801 is monomorphic in AFRS, AMRS, and Locus #1 Locus #1 Locus #2 Locus #2 P-value
location location LD
ADRS subsets. rs1871534, rs3916235, and rs7657799 are monomorphic
only in Amerindian-derived individuals. Invariable loci rs1800414, AFRS rs1572018 Chr13: rs2166634 Chr10: 1.76 ×
41715282 118436068 10− 06
rs671, and rs3811801 were also monomorphic for the same alleles in
rs1834619 Chr2: rs1876482 Chr2: 7.07 ×
Basques [30], Danes, Somalis [32], Greenlanders [45], and Ecuadorians 17901485 17362568 10− 09
[46]. Further inquiries at the Ensembl Genome Browser (Release 99) rs3916235 Chr18: rs4891825 Chr18: 1.74 ×
showed that these three markers have the same fixed allele in all Eu 67578931 67867663 10− 09
ropean, Native American, and African samples reported to date, while EURS rs1407434 Chr1: rs3827760 Chr2: 1.03 ×
186149032 109513601 10− 06
are polymorphic in East Asian populations. Therefore, the lack of genetic rs260690 Chr2: rs3827760 Chr2: 1.11 ×
variability in these loci should not be extrapolated to the Brazilian 109579738 109513601 10− 11
population as a whole, as the sampling of this study was conducted in a rs4471745 Chr17: rs731257 Chr7: 2.01 ×
single Brazilian federative unity (out of 27), particularly the one pre 53568884 12669251 10− 06
ADRS rs1426654 Chr15: rs735480 Chr15: 6.22 ×
senting the lowest rate of Asian ethnic composition, as reported by
48426484 45152371 10− 07
Brazilian Institute of Geography and Statistics (IBGE) demographic rs1834619 Chr2: rs1876482 Chr2: 4.22 ×
census [48]. These loci are expected to be variable in samples from 17901485 17362568 10− 08
southeastern Brazil, for instance, given the historical presence of Asian rs3916235 Chr18: rs4891825 Chr18: 1.66 ×
immigrants in this particular region [49]. 67578931 67867663 10− 11
No statistically significant deviation from HWE was observed after AFRS = African-derived Gauchos; EURS = European-derived Gauchos; ADRS =
Bonferroni correction (p > 3.03 × 10− 4) in any ethnic subset. However, Admixed-derived Gauchos.
4
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
Gauchos, the SNP with the highest power of exclusion was rs1040045 Table 3
(PE = 0.3710), while in EURS was rs3784230 (PE = 0.2632). In AMRS Pairwise FST test for RS State (Brazil) main population groups based on 165 SNPs
population, SNPs with the highest PE were rs948028 and rs7745461, of Precision ID Ancestry Panel (TFS; Waltham, MA, USA). FST values are pre
both with a PE value of 0.4717. Combined power of exclusion (CPE) of sented in lower-left diagonal, while upper-right diagonal exhibits the signifi
165 SNPs included in Precision ID Ancestry Panel was, for AFRS, EURS, cance matrix (p = 0.00000).
AMRS, and ADRS: 99.99999960%, 99.99954437%, 99.99999967%, and Population AFRS EURS AMRS ADRS
99.99999995%, respectively. CMP and CPE metrics could be regarded as AFRS + + +
indicators to evaluate the efficiency of genetic markers in forensic EURS 0.26051 + +
individualization. Forensic descriptive parameters of Precision ID AMRS 0.30261 0.38631 +
ADRS 0.07191 0.07702 0.23357
Ancestry Panel (TFS; Waltham, MA, USA) estimated for each population
group revealed that, although its primary purpose is biogeographic AFRS = African-derived Gauchos; EURS = European-derived Gauchos; ADRS =
ancestry inference (whereas for an identification tool it is more suitable Admixed-derived Gauchos.
to use other panels, for instance, the Precision ID Identity Panel [57]), Significant values were represented by “+ ” signal.
this panel has enough polymorphic and informative SNPs to be used as a
supplementary instrument for individual identification in the forensic Furthermore, the trihybrid multiethnic group displays tighter (and quite
analytical repertoire. similar) genetic relationships with African-derived and European-
Moreover, average random match probability (RMP) based on indi derived population groups, and more distant (albeit closer than
vidual genotypic frequencies for all 165 SNPs was calculated for AFRS, others) with the Amerindian one. These findings contrast with the re
EURS, AMRS, and ADRS populations and for RS State as a whole (RSBR). sults of a previous study concerning color and genomic ancestry in
For the latter, an adjusted allele frequencies table was generated Brazilians [59], in which no statistically significant degrees of genetic
considering the relative contribution of each aforementioned group in differentiation was observed among individuals classified as Whites,
RS population formation, according to IBGE demographic census [58] Intermediates, and Blacks from São Paulo city, southeastern Brazil, by
(Supplementary Table S7). Results are presented in Table 2. A rather typing of 12 STR loci. The use of forensic STRs for delineating popula
significant overlap between ADRS RMPs in the three main ethnic pop tion structure may explain disparities found, as markers with relatively
ulations (EURSPop., AFRSPop., and AMRSPop.) can be observed, corrobo lower mutation rates (SNP, Alu, Indel) are more suitable to provide
rating a trihybrid composition to the admixed nature of this population biogeographic resolution at continental level [60].
sample. On the other hand, the average probabilities of AFRS, EURS, and Furthermore, pairwise FST values were calculated based on Precision
AMRS genetic profiles to occur in populations other than their own (and ID Ancestry Panel (TFS; Waltham, MA, USA) SNPs among RS main
ADRSPop., for AFRS and EURS profiles) are at least 25 orders of magni population groups and 89 reference worldwide populations (see Sup
tude lower. Furthermore, EURS and ADRS are the most frequent genetic plementary Table S1 for details). Results are displayed as a heatmap in
profiles found in RSBR population. Wright’s F-statistics (discussed later) Supplementary Fig. S1 and pairwise FST values are detailed in Supple
shed light on the above outcomes regarding forensic aspects of the four mentary Table S12. African-derived Gauchos showed higher similarity
RS population groups. levels with African Americans (AFRS–ASW: FST = 0.0162; AFRS–AAM:
FST = 0.0177), followed by Eastern African Somalis (AFRS–SOM: FST =
3.2. Interpopulation genetics analyses 0.0372) and Ethiopian Jews (AFRS–ETJ: FST = 0.0434), and more con
spicuous divergence with Native Americans Suruí and Karitiana from
Based on Precision ID Ancestry Panel (TFS; Waltham, MA, USA), Amazon region (AFRS–SUR: FST = 0.4599; AFRS–KAR: FST = 0.4366).
pairwise FST for RS main population groups ranged from 0.07191 (AFRS European-derived Gauchos, on the other hand, presented more genetic
and ADRS) to 0.38631 (EURS and AMRS). Table 3 presents results ob proximity with Central and Southern Europe populations (EURS–HGR:
tained with pairwise FST test in investigated population groups. Overall, FST = 0.0017; EURS–GRK: FST = 0.0049; and EURS–TSI: FST = 0.0055)
Amerindian population was found to be the most genetically distinct and succeeded by European Americans (EURS–EAM: FST = 0.0058), and
structured, with consistently higher observed pairwise FST values, fol highest differentiation levels with Native American Suruí (EURS–SUR:
lowed by European, African, and Admixed ethnicities, respectively. FST = 0.5325) and Biaka, pygmies from Central Africa (EURS–BIA: FST =
Considering the 165 ancestry-informative markers evaluated, there is a 0.5309). Brazilians with Amerindian ethnicity from RS State displayed
remarkable genetic differentiation level among population groups more prominent genetic similarity with Peruvians (AMRS–PEL: FST =
derived from the three main parental populations that bolstered the 0.0201), Maya (AMRS–MAY: FST = 0.0238), and Quechua (AMRS–QUE:
peopling of Brazil (Europeans, Africans, and Native Americans). FST = 0.0269), followed by North American Plains Amerindians
(AMRS–NPA: FST = 0.0494), corroborating the admixed nature of AMRS
Table 2
population group. Higher divergence levels were with Biaka pygmies
Average random match probability (RMP) of genetic profiles from each RS State and Western Africans (AMRS–BIA: FST = 0.5991; AMRS–ESN: FST =
population group in each ethnic population and in RS population as whole 0.5660; AMRS–YOR: FST = 0.5622). Admixed-derived Gauchos, char
(RSBR; adjusted allele frequencies), based on allele frequencies of the 165 SNPs acterized by miscegenation among two or three of Brazilian main ethnic
included in Precision ID Ancestry Panel (TFS; Waltham, MA, USA). roots (European, African, and Amerindian), revealed higher similarity
AFRSProf. EURSProf. AMRSProf. ADRSProf. with Puerto Ricans and Colombians (ADRS–PUR: FST = 0.0128;
ADRS–CLM: FST = 0.0267), and more evident population differentiation
AFRSPop. 3.62E-50 ± 4.18E-82 ± 1.16E-88 ± 8.47E-60 ±
2.03E-49 2.88E-81 9.05E-88 6.42E-59
levels with Native Americans from Amazon region (ADRS–SUR: FST =
EURSPop. 7.89E-75 ± 7.67E-41 ± 1.37E-95 ± 7.67E-54 ± 0.4005; ADRS–KAR: FST = 0.3744).
7.48E-74 5.40E-40 1.30E-94 7.15E-53 To further investigate the above results regarding interpopulation
AMRSPop. 1.63E-88 ± 7.51E-84 ± 2.36E-48 ± 4.99E-70 ± genetic relationships of RS State main ethnicities and 89 worldwide
7.39E-88 3.44E-83 1.07E-47 2.29E-69
populations, an MDS plot was drawn based on pairwise FST values
ADRSPop. 2.15E-56 ± 1.56E-48 ± 3.35E-87 ± 3.14E-55 ±
1.50E-55 1.34E-47 2.85E-86 2.60E-54 (Fig. 1). MDS graph exhibits positive values in Dimension 1 as a char
RSBRPop. 1.27E-72 ± 6.23E-44 ± 8.65E-78 ± 3.05E-49 ± acteristic feature for African (AFR) populations. Sub-Saharan African
7.44E-72 3.07E-43 3.97E-77 2.61E-48 populations are closely clustered at bottom-right edge of the quadrant,
AFRS = African-derived Gauchos; EURS = European-derived Gauchos; ADRS = while admixed AFR populations have broader dispersion along the axis.
Admixed-derived Gauchos. AFRS population is relatively close to admixed East African populations
Prof.
= Profile; Pop. = Population. (SOM and ETJ) and African Americans (AAM and ASW) in Dimension 1
5
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
Fig. 1. Genetic distances evaluation among RS State (Brazil) main population groups and 89 worldwide populations, presented as an MDS plot based on pairwise FST
values for 165 SNPs included in Precision ID Ancestry Panel (TFS; Waltham, MA, USA). Genetic distances between all pairs of populations were included, and multi-
dimension scaling procedure was applied to reduce dimensionality, from an n-dimensional space to a Cartesian space. Spatial proximity in the plot indicates genetic
similarity between populations, while distant populations tend to be located apart from each other.
6
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
inference methods, based on 5000 individuals from 93 worldwide 33.7–86.1%, EUR: 1.5–51.3%, and NAM: 0–52.4%. EURS displays a very
populations. Fig. 2 presents populational bar charts of estimated cluster similar clustering pattern to that of the North American counterpart
membership values from STRUCTURE runs for Brazilian samples (EAM) and European populations, with an almost total predominance of
alongside 89 reference populations. Estimates are based on individual European component. Besides, a low NAM/Asian ancestry is also
genotypes for all 165 ancestry-informative SNPs composing Precision ID noticeable, and almost none AFR component is perceived. Indeed,
Ancestry Panel (TFS; Waltham, MA, USA). The optimal number of within EURS, ancestry proportions vary from AFR: 0–1.4%, EUR:
clusters according to Evanno method is K = 3, although higher K values 65.9–99.5%, and NAM: 0–33.2%. AMRS, as the Peruvians (PEL), ex
successfully partitioned the populations into further continental (or hibits an expressive NAM/Asian component. Individually, AMRS
even more geographically refined) divisions. When considering runs ancestry proportions range from 0% to 9.8% (AFR), 0–44.2% (EUR), and
ranging from K = 5–20, Structure Harvester results indicate K = 7 as 55–99.4% (NAM). ADRS presents a well-defined admixed pattern, with
optimal K number (Supplementary Fig. S2). At K = 2 (data not shown) the three ancestry components clearly discernible. There is a prevalence
African and non-African ancestry components could be identified. At K of EUR composition, followed by AFR and NAM/Asian, respectively,
= 3, African (blue), European (green), and Native American/Asian (red) corroborating results obtained with the MDS chart. At individual level,
ancestry components are discernible. ancestry proportions vary from AFR: 0–61.8%, EUR: 19.0–89.9%, and
At optimal K number of 3, AFRS presents clustering patterns similar NAM: 0–43.8%. Average ancestry estimates of RS State population
to adjoining African-American subpopulations (ACB, AAM, and ASW), samples (AFRS, EURS, AMRS, and ADRS) were inferred based on both
although the green (EUR) and red (NAM/Asian) components are more optimal K values and are presented in Table 5. Results were extracted
pronounced, suggesting a higher admixing level among the parental from runs with the largest Ln Probability Data [LnP(D)].
populations that originated African-derived Gauchos than in African At K = 7, 8, and 9, the Central African, North African/Middle
Americans. Within the AFRS, ancestry proportions range from AFR: Eastern, Central and North Asia, and Pacific ancestry components
Fig. 2. Population structure of RS State (Brazil) main population groups along with 89 worldwide populations, based on 165 SNPs included in Precision ID Ancestry
Panel (TFS; Waltham, MA, USA). STRUCTURE plots are presented with cluster (K) number ranging from 3 to 9 (top to bottom; data for K = 6 and K = 8 not shown).
The optimal number of clusters was three. Each vertical line stands for an individual, with colors representing the relative proportion of association with each
inferred cluster. Populations referring to each number and respective geographic locations are listed in Supplementary Table S1.
7
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
8
A.B. Felkl et al. Forensic Science International: Genetics 64 (2023) 102838
criminal samples as an influence factor in quality metrics, Forensic Sci. Int. 303 [42] H. Simayijiang, C. Børsting, T. Tvedebrink, N. Morling, Analysis of Uyghur and
(2019), 109938. Kazakh populations using the Precision ID Ancestry Panel, Forensic Sci. Int. Genet
[23] K. Wasik, T. Berisa, J.K. Pickrell, et al., Comparing low-pass sequencing and 43 (2019), 102144.
genotyping for trait mapping in pharmacogenetics, BMC Genom. 20-22 (1) (2021) [43] T. Xie, C. Shen, C. Liu, et al., Ancestry inference and admixture component
197. estimations of Chinese Kazak group based on 165 AIM-SNPs via NGS Platform,
[24] N. Li, M. Stephens, Modeling linkage disequilibrium and identifying recombination J. Hum. Genet (2020).
hotspots using single-nucleotide polymorphism data, Genetics 165 (4) (2003) [44] J.H. Lee, S. Cho, M.Y. Kim, et al., Genetic resolution of applied biosystems™
2213–2233. precision ID Ancestry panel for seven Asian populations, Leg. Med. 34 (2018)
[25] H.E. Lischer, L. Excoffier, PGDSpider: an automated data conversion tool for 41–47.
connecting population genetics and genomics programs, Bioinformatics 28 (2) [45] G. Espregueira Themudo, H. Smidt Mogensen, C. Børsting, N. Morling, Frequencies
(2012) 298–299. of HID-ion ampliseq ancestry panel markers among greenlanders, Forensic Sci. Int.
[26] A. Gouy, M. Zieger, STRAF - A convenient online tool for STR data evaluation in Genet. 24 (2016) 60–64.
forensic genetics, Forensic Sci. Int. Genet 30 (2017) 148–151. [46] R. Santangelo, F. González-Andrade, C. Børsting, A. Torroni, V. Pereira, N. Morling,
[27] L. Excoffier, H.E. Lischer, Arlequin suite ver 3.5: a new series of programs to Analysis of ancestry informative markers in three main ethnic groups from Ecuador
perform population genetics analyses under Linux and Windows, Mol. Ecol. supports a trihybrid origin of Ecuadorians, Forensic Sci. Int. Genet. 31 (2017)
Resour. 10 (3) (2010) 564–567. 29–33.
[28] C.E. Bonferroni, Teoria statistica delle classi e calcolo delle probabilità, Pubbl. Del. [47] D.M. Truelsen, M.S. Farzad, H.S. Mogensen, et al., Typing of two Middle Eastern
Reg. Ist. Super. di Sci. Econ. e Commer. di Firenze 8 (1936) 3–62. populations with the Precision ID Ancestry Panel, Forensic Sci. Int. Genet 6 (2017)
[29] G.R. Abecasis, A. Auton, et al., 1000 Genomes Project Consortium, An integrated e301–e302.
map of genetic variation from 1,092 human genomes, Nature 491 (7422) (2012) [48] IBGE. Sistema IBGE de Recuperação Automática - SIDRA. Tabela 136 - População
56–65. residente por cor ou raça (2010). Instituto Brasileiro de Geografia e Estatística.
[30] O. García, J.A. Ajuriagerra, A. Alday, et al., Frequencies of the precision ID [49] IBGE. Brasil: 500 anos de povoamento (2007). Rio de Janeiro: Instituto Brasileiro
ancestry panel markers in Basques using the Ion Torrent PGM™ platform, Forensic de Geografia e Estatística.
Sci. Int. Genet 31 (2017) e1–e4. [50] C. Phillips, D. Ballard, P. Gill, D.S. Court, A. Carracedo, M.V. Lareu, The
[31] G. He, Z. Wang, M. Wang, et al., Forensic ancestry analysis in two Chinese minority recombination landscape around forensic STRs: Accurate measurement of genetic
populations using massively parallel sequencing of 165 ancestry-informative SNPs, distances between syntenic STR pairs using HapMap high density SNP data,
Electrophoresis 39 (21) (2018) 2732–2742. Forensic Sci. Int Genet. 6 (3) (2012) 354–365.
[32] V. Pereira, H.S. Mogensen, C. Børsting, N. Morling, Evaluation of the Precision ID [51] K.G. Ardlie, L. Kruglyak, M. Seielstad, Patterns of linkage disequilibrium in the
Ancestry Panel for crime case work: a SNP typing assay developed for typing of 165 human genome, Nat. Rev. Genet 3 (4) (2002) 299–309.
ancestral informative markers, Forensic Sci. Int. Genet 28 (2017) 138–145. [52] F. Saloum de Neves Manta, R. Pereira, R. Vianna, et al., Revisiting the genetic
[33] A.J. Pakstis, W.C. Speed, U. Soundararajan, et al., Population relationships based ancestry of Brazilians using autosomal AIM-Indels, PLoS One 8 (9) (2013), e75145.
on 170 ancestry SNPs from the combined Kidd and Seldin panels, Sci. Rep. 9 [53] F.C. Parra, R.C. Amado, J.R. Lambertucci, J. Rocha, C.M. Antunes, S.D. Pena, Color
(2019) 18874. and genomic ancestry in Brazilians, Proc. Natl. Acad. Sci. USA 100 (1) (2003)
[34] I.B.M. Corp. I.B.M. SPSS , 2017. Statistics for Windows. Version 25.0, Released 177–182.
2017. Armonk, NY. [54] S.D. Pena, G. Di Pietro, M. Fuchshuber-Moraes, et al., The genomic ancestry of
[35] J.K. Pritchard, M. Stephens, P. Donnelly, Inference of population structure using individuals from different geographical regions of Brazil is more uniform than
multilocus genotype data, Genetics 155 (2) (2000) 945–959. expected, PLoS One 6 (2) (2011), e17063.
[36] D. Falush, M. Stephens, J.K. Pritchard, Inference of population structure using [55] Y.C. Muniz, L.B. Ferreira, C.T. Mendes-Junior, C.E. Wiezel, A.L. Simões, Genomic
multilocus genotype data: linked loci and correlated allele frequencies, Genetics ancestry in urban Afro-Brazilians, Ann. Hum. Biol. 35 (1) (2008) 104–111.
164 (4) (2003) 1567–1587. [56] C.C. Gontijo, F.M. Mendes, C.A. Santos, et al., Ancestry analysis in rural Brazilian
[37] N.M. Kopelman, J. Mayzel, M. Jakobsson, N.A. Rosenberg, I. Mayrose, Clumpak: a populations of African descent, Forensic Sci. Int. Genet 36 (2018) 160–166.
program for identifying clustering modes and packaging population structure [57] E. Avila, A.B. Felkl, P. Graebin, C.P. Nunes, C.S. Alho, Forensic characterization of
inferences across K, Mol. Ecol. Resour. 15 (5) (2015) 1179–1191. Brazilian regional populations through massive parallel sequencing of 124 SNPs
[38] D.A. Earl, B.M. vonHoldt, Structure Harvester: a website and program for included in HID ion Ampliseq Identity Panel, Forensic Sci. Int. Genet 40 (2019)
visualizing structure output and implementing the Evanno method, Conserv. 74–84.
Genet. Resour. 4 (2) (2011) 359–361. [58] I.B.G.E., 2010. Sistema IBGE de Recuperação Automática - SIDRA. Tabela 136 -
[39] G. Evanno, S. Regnaut, J. Goudet, Detecting the number of clusters of individuals População residente por cor ou raça (2010). Instituto Brasileiro de Geografia e
using the software structure: a simulation study, Mol. Ecol. 14 (8) (2005) Estatística.
2611–2620. [59] J.R. Pimenta, L.W. Zuccherato, A.A. Debes, et al., Color and genomic ancestry in
[40] H. Nakanishi, V. Pereira, C. Børsting, et al., Analysis of mainland Japanese and Brazilians: a study with forensic microsatellites, Hum. Hered. 62 (4) (2006)
Okinawan Japanese populations using the precision ID Ancestry Panel, Forensic 190–195.
Sci. Int. Genet 33 (2018) 106–109. [60] A. Moriot, C. Santos, A. Freire-Aradas, C. Phillips, D. Hall, Inferring biogeographic
[41] Z. Wang, G. He, T. Luo, et al., Massively parallel sequencing of 165 ancestry ancestry with compound markers of slow and fast evolving polymorphisms, Eur. J.
informative SNPs in two Chinese Tibetan-Burmese minority ethnicities, Forensic Hum. Genet 26 (11) (2018) 1697–1707.
Sci. Int Genet 34 (2018) 141–147. [61] A.J. Pakstis, C. Gurkan, M. Dogan, et al., Genetic relationships of European,
Mediterranean, and SW Asian populations using a panel of 55 AISNPs, Eur. J. Hum.
Genet 27 (12) (2019) 1885–1893.