Estimation of A Significance Threshold For Genome-Wide Association Studies

Kaler and Purcell BMC Genomics (2019) 20:618
https://doi.org/10.1186/s12864-019-5992-7
METHODOLOGY ARTICLE Open Access
Estimation of a significance threshold for

genome-wide association studies
Avjinder S. Kaler and Larry C. Purcell*
Abstract
Background: Selection of an appropriate statistical significance threshold in genome-wide association studies is
critical to differentiate true positives from false positives and false negatives. Different multiple testing comparison
methods have been developed to determine the significance threshold; however, these methods may be overly
conservative and may lead to an increase in false negatives. Here, we developed an empirical formula to determine
the statistical significance threshold that is based on the marker-based heritability of the trait. To develop a formula
for a significance threshold, we used 45 simulated traits in soybean, maize, and rice that varied in both broad sense
heritability and the number of QTLs.
Results: A formula to determine a significance threshold was developed based on a regression equation that used
one independent variable, marker-based heritability, and one response variable, − log10 (P)-values. For all species,
the threshold –log10 (P)-values increased as both marker-based and broad-sense heritability increased. Higher broad
sense heritability in these crops resulted in higher significant threshold values. Among crop species, maize, with a
lower linkage disequilibrium pattern, had higher significant threshold values as compared to soybean and rice.
Conclusions: Our formula was less conservative and identified more true positive associations than the false
discovery rate and Bonferroni correction methods.
Keywords: Genome-wide association studies, Significant threshold, Bonferroni correction, False discovery rate,
Heritability, Single nucleotide polymorphisms
Background to consider population structure and family relatedness

Linkage mapping (LM) and genome-wide association [3, 4]. Since the publication of MLM for GWAS [3],
studies (GWAS) are the two most popular methods to many MLM-based methods have been developed. All
decipher genetic architectures of complex traits in crops these methods are single-locus, which test one marker at
[1]. With advancements in high throughput genotyping a time, and these methods fail to match the true genetic
and sequencing technologies, single nucleotide polymor- model of complex traits that are controlled by many loci
phisms (SNPs) provide relatively low cost and dense simultaneously. To overcome this problem, multi-locus
marker coverage across various genomes [2]. Association models, including FASTmrEMMAa [5], ISIS EM-
mapping has several advantages over the traditional LM, BLASSO [6], pLARmEB [7], pKWmEB [8], LASSO [9],
including increased mapping resolution, broader allele and FarmCPU [10], have been developed.
coverage, and reduced time and costs to establish tedi- Determining the correct P-value threshold for statis-
ous and expensive biparental mapping populations [3]. tical significance is critical to differentiate true positives
A major problem in GWAS is false positives that arise from false positives and false negatives. To determine
from population structure and family relatedness. Sev- the statistical significance threshold in GWAS, different
eral statistical models have been developed to control statistical procedures accounting for multiple testing
false positives in GWAS. Mixed linear model (MLM) have been proposed, including the Bonferroni correc-
has become the most popular approach with the ability tion, Sidak correction, False Discovery Rate (FDR), per-
mutation test, and Bayesian approaches. Bonferroni
* Correspondence: [email protected] correction and FDR [11–15] are the two most com-
Department of Crop, Soil, and Environmental Sciences, University of
Arkansas, Fayetteville, AR 72704, USA monly used methods for crops. All of these methods
© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Kaler and Purcell BMC Genomics (2019) 20:618 Page 2 of 8
limit type 1 errors (false-positives), but they almost cer- because of differences in LD pattern with maize having a
tainly inflate type 2 errors (false negatives) [16]. lower LD pattern compared with soybean and rice. The
The Bonferroni correction method is considered the phenotypes were simulated and associated with freely-
most conservative method for selecting a threshold P- available SNP marker datasets for all these crops.
value due to the assumption that every genetic variant
tested is independent of the rest. The False Discovery
Rate controls the expected proportion of false positives Results and discussion
among the rejected null hypotheses and is a popular, less In this study, we developed a method to determine the
conservative approach compared to the Bonferroni cor- significant threshold value for GWAS using the 45 simu-
rection [15]. However, FDR also assumes independence lated phenotypic traits that varied in both the broad
of hypotheses; therefore, if many SNPs in strong linkage sense heritability and the number of QTLs in three crop
disequilibrium (LD) are present on an array, it can suffer species that differed in their LD patterns. We repeated
from a loss of statistical power and generate false nega- the simulation of these traits 10 times so that simulated
tives [17]. An imbalance of error rates permitting an ex- QTLs were randomly assigned to different parts of the
cess of false negatives may be more problematic in the genome in order to obtain unbiased results.
long term because type 1 errors are more easily identi- For the same simulated trait in different repetitions,
fied in subsequent studies, and the resources necessary there were different marker-based heritabilities and dif-
to perform other large GWAS needed to overcome the ferent significant – log10 (P)-values (where all simulated
bias toward type 2 errors are finite [16]. Additionally, QTLs in that trait were present) (Fig. 1). There were
the variants tested in a study are inevitably dependent strong positive associations between broad sense herit-
on population-specific factors, such as LD pattern and ability and significant threshold values. That is, the
minor allele frequency (MAF), suggesting that the ap- higher the broad sense heritability, the higher the –
propriate threshold for genome-wide significance might log10 (P)-values for all three crops (Table 1). Significant
vary for different populations and crop species. For ex- threshold values (−log10 (P)) also increased among the
ample, the threshold for a crop with a lower LD pattern, crop species for these simulated traits as the LD de-
such as maize (Zea mays L.), should be more stringent creased. Specifically, maize had higher significant thresh-
than a population with higher LD pattern, such as soy- old (−log10 (P)) values as compared to soybean and rice
bean (Glycine max L.) or rice (Oryza sativa L.), as the for simulated traits when they had more than 50% broad
number of independent markers tends to be greater in sense heritability (Table 1), which corresponded in-
maize than soybean. The LD decay rate (r2 = 0.25 level) versely with LD patterns.
was much greater in maize (1 kb) [18] than soybean (150 Using both broad-sense heritability and marker-based
kb in euchromatic and 5,000 kb heterochromatic regions) heritability as independent variables and the selected sig-
[19–21]. or rice (123 kb) [22]. Therefore, there is a need to nificant threshold (−log10 (P)) value as the response vari-
develop a method that can select an appropriate signifi- able in the multiple regression analysis, we obtained an
cant threshold value for GWAS to differentiate true posi- equation for determining significant threshold values in
tives from false positives and false negatives. GWAS for each crop. We observed that marker-based
As trait complexity increases, the number of loci af- heritability showed a significant effect on the response
fecting the trait increases along with environmental in- variable (P < 0.05) (Table 2), but there was no significant
teractions with an expected decrease in heritability. effect of broad-sense heritability. Therefore, only
Conversely, for less complex traits, fewer loci affect the marker-based heritability was included in the regression
trait, there is less interaction with the environment, and eq. (Y = a + bX), where Y was the significant threshold
there is an expected increase in heritability. For a trait (−log10 P-value), a was the intercept, and b was the slope
with a high heritability, the threshold value for signifi- of the regression coefficient for the marker-based herit-
cance of associating loci with a trait would have high – ability (X) in maize, soybean, and rice. Table 2 shows the
log10 (P)-values, and vice versa for a complex trait with intercept and slope of regression equations in 10 out of
low heritability. 100 different repetitions. We used the raw value of the
Here, we develop an empirical formula to determine intercept and slope from 100 different repetitions to de-
the statistical significance thresholds that is based on the velop the final formula. Although, the fit for regression
marker-based heritability of the trait. The objective of equation was poor for maize (R2 = 0.14) and rice (R2 =
this study was to develop an empirical formula that can 0.16), and was moderate for soybean (R2 = 0.35), these
determine the statistical significance thresholds for regressions were highly significant (P < 0.0001) and indi-
GWAS using a large number of simulated phenotypes cate that the predictor variables still provide information
that varied in heritability and the number of QTLs for about the response even though data points fall further
soybean, maize, and rice. These crops were selected from the regression line.
Fig. 1 Manhattan plots of -Log10 (P) vs. chromosomal position of SNP markers associated with ear diameter (ED) and days to pollination (DP),
and quantile-quantile (QQ) plots in maize from the Fixed and random model Circulating Probability Unification (FarmCPU). Marker-based
heritability was 66.8% for DP and 84.9% for ED. A red line represents the significant threshold (−Log10 (P) values: 4.89 for DP and 5.49 for ED),
which was determined using our formula based on the marker-based heritability, a blue line represents the threshold from the FDR, and a green
line represents the threshold from the Bonferroni correction method
For datasets based on previously reported results, esti- response significant threshold and marker-based herit-
mated marker-based heritability was 66.8% for DP and ability in maize, soybean, and rice.
84.9% for ED in maize, 28.6% for C13 and 77.8% for CW Manhattan and QQ plots in Figs. 1–3 show the com-
in soybean, and 42.8% for SD and 68.8% for PH in rice. parisons of our formula based threshold (a red line) with
These marker-based heritability values were used to de- FDR (a blue line) and Bonferroni correction (a green
termine significant threshold (−log10 (P)) values as line) methods using previously published datasets for DP
shown in Figs. 1, 2, and 3 based upon the regression and ED in maize (Fig. 1), C13, and CW in soybean
equation for each respective crop in Table 2. Add- (Fig. 2), and SD and PH in rice (Fig. 3). The sharp break
itional file 1: Figure S1 shows the relationship between upwards in QQ plots indicates where the P-value thresh-
old for true associations begin [19]. The P-value thresh-
Table 1 Significant P-values (−Log10 P-value) from FarmCPU old determined using our method captured more true
where all 10 associated QTLs with 9 simulated traits varied in positives than the FDR and Bonferroni corrections
broad sense heritability (H = 10, 20, 30, 40, 50, 60, 70, 80, 90%) in methods as indicated by being closer to the breakpoint
maize, soybean, and rice at which the observed P-value increases sharply. Some
Maize Rice Soybean of the extra markers that were identified for previously
Simulated Traits published datasets by our formula-based threshold, were
H10_Q10 3.54 3.94 3.17 coincident in the same genomic region of previously re-
H20_Q10 3.67 3.91 3.58
ported QTLs studies for that trait (data not shown).
Higher broad sense heritability traits in these crops had
H30_Q10 4.00 4.05 3.64
higher significant threshold values. Among crop species,
H40_Q10 4.17 4.23 3.84 maize, with a lower LD pattern, had higher significant
H50_Q10 4.68 4.29 4.03 threshold values as compared to soybean and rice
H60_Q10 4.84 4.45 4.12 (Figs. 1, 2, 3).
H70_Q10 5.07 4.65 4.73 We also used the one simulated trait in soybean that
H80_Q10 7.02 5.39 5.62
had 60% broad sense heritability and 10 QTLs in three
randomly selected repetitions (R4, R7, and R9) to deter-
H90_Q10 15.08 7.45 7.95
mine if our formula accurately estimated threshold P-
Table 2 Intercept (a) and slope (b) values of regression eqs. (Y = a + bX), predicting the significant threshold (−Log10 P-value), as a
function of the marker-based heritability (X) in maize, soybean, and rice
Maize Soybean Rice
Repetition Constant Slope R2
P-value Constant Slope R 2
P-value Constant Slope R2 P-value
1 2.49 0.032 0.15 0.008 2.10 0.027 0.45 4.4e-07 2.59 0.016 0.11 0.02
2 2.91 0.022 0.10 0.03 2.05 0.030 0.40 2.9e-06 2.58 0.015 0.11 0.02
3 2.71 0.031 0.14 0.01 2.09 0.033 0.22 0.001 2.52 0.017 0.18 0.004
4 2.93 0.019 0.13 0.01 2.20 0.026 0.36 1.3e-05 2.26 0.021 0.19 0.003
5 2.75 0.024 0.11 0.02 2.01 0.032 0.40 3.7e-06 2.33 0.022 0.20 0.002
6 2.88 0.022 0.09 0.04 2.28 0.027 0.42 1.3e-06 2.62 0.016 0.13 0.01
7 2.87 0.022 0.15 0.008 2.18 0.026 0.40 3.6e-06 2.62 0.020 0.11 0.02
8 2.75 0.026 0.13 0.01 2.16 0.027 0.36 1.5e-05 2.41 0.017 0.21 0.001
9 2.47 0.034 0.12 0.01 2.10 0.030 0.39 3.9e-06 2.64 0.017 0.14 0.01
10 2.68 0.027 0.13 0.01 2.14 0.028 0.39 4.0e-06 2.51 0.018 0.19 0.003
All Raw Data 2.77 0.025 0.14 7.6e-15 2.16 0.028 0.35 < 2.2e-16 2.53 0.017 0.15 2.8e-16
values identified in the 10 simulated QTLs. A simulated upwards in QQ plots from this simulated trait in all
trait in different repetitions had different marker-based three repetitions also indicated that our formula-based
heritability values of 48.6% (R4), 43.2% (R7), and 39.1% threshold values identified 10 true associations (Fig. 4).
(R9). Using this marker-based heritability, significant Using the equation developed from marker-based her-
threshold P-values were determined for the simulated itability, we evaluated our threshold P-values with other
trait in all three repetitions. Results indicated that our multiple testing comparison methods using the GWAS
formula-based threshold values identified 10 QTLs for results from the previously-published phenotypic data-
this simulated trait in these three repetitions across dif- sets in maize [23], soybean [19, 20], and rice [24]. The
ferent parts of the genome (Fig. 4). The sharp break results indicated that selection of significant threshold
Fig. 2 Manhattan plots of -Log10 (P) vs. chromosomal position of SNP markers associated with canopy wilting (CW) and carbon isotope ratio
(C13), and quantile-quantile (QQ) plots in soybean from the Fixed and random model Circulating Probability Unification (FarmCPU). Marker-based
heritability was 28.6% for C13 and 77.8% for CW. A red line represents the significant threshold (−Log10 (P) values: 2.96 for C13 and 4.39 for CW),
which was determined using our formula based on the marker-based heritability, a blue line represents the threshold from the FDR, and a green
line represents the threshold from the Bonferroni correction method
Fig. 3 Manhattan plots of -Log10 (P) vs. chromosomal position of SNP markers associated with seeds per panicle (SD) and plant height (PH), and
quantile-quantile (QQ) plots in soybean from the Fixed and random model Circulating Probability Unification (FarmCPU). Marker-based heritability
was 42.8% for SD and 68.8% for PH. A red line represents the significant threshold (−Log10 (P) values: 3.28 for SD and 3.75 for PH), which was
determined using our formula based on the marker-based heritability, a blue line represents the threshold from the FDR, and a green line
represents the threshold from the Bonferroni correction method
values based on our formula were less conservative than (Kaler et al. unpublished results), which are single-locus
other multiple comparisons in controlling both false models.
positives and false negatives (Table 3). Table 3 shows the
comparisons of having no correction (uncorrected P ≤ Conclusions
0.05) with our formula, Bonferroni correction, and FDR. We developed a simple method for determining the
Because Bonferroni, Šidák, Hommel, and Hochberg cor- threshold P-value for GWAS based upon the marker-
rections had similar results, and False Discovery Rate based heritability of a trait in a specific environment.
and Positive False Discovery Rate had similar results, This method is simple and robust across a wide range of
only Bonferroni correction and FDR are shown in Table heritabilities and species with different LD. This method
3. For all traits in maize, soybean, and rice, our formula is less conservative and captures more true positives as
was less conservative in identifying true positive associa- compared to more conservative methods such as FDR
tions as compared to both FDR and Bonferroni correc- and Bonferroni corrections.
tion methods (Table 3). The column marked none in
Table 3 represents the selection of significant SNPs at a Methods
threshold value (−log10 P ≥ 3.5), which was the arbitrary Data collection
selection. Our formula identified a greater number of To develop a formula for a significance threshold, we
markers than the uncorrected method for the C13 trait used 45 simulated traits in soybean, maize, and rice that
in soybean, which might be due to the generation of varied in broad sense heritability and the number of
false negatives in the uncorrected method. QTLs (Q). We used an R code script for simulation,
These results indicate that selection of significant where real genotypic data of each crop was used and dif-
threshold values vary in different populations and crop ferent number of QTLs and heritability were assigned to
species, which depend on the heritability of the trait in a create a simulated phenotype. In soybean, genotypic data
particular environment. The GWAS results for these consisted of 42,509 SNP markers (www.soybase.org) for
comparisons were obtained from the FarmCPU model 346 accessions that were previously reported by Kaler et
because this multi-locus model effectively controlled al. [19, 20]. Phenotypic data for canopy wilting and car-
false positives that arise from population structure and bon isotope ratio for these 346 accessions is provided in
family relatedness as compared to all MLM models Additional file 1: Table S1. In maize, genotypic data
Fig. 4 Manhattan plots of -Log10 (P) vs. chromosomal position of SNP markers associated with soybean simulated trait that had 60% heritability
and 10 QTLs from three randomly selected repetitions (R4, R7, and R9) using the real SNP markers dataset, and quantile-quantile (QQ) plots in
soybean from the Fixed and random model Circulating Probability Unification (FarmCPU). Estimated marker-based heritability of this simulated
trait was 48.6% in R4, 43.2% in R7, and 39.1% in R9, which was used in the formula to select significant thresholds -Log10 (P) values, such 3.54 in
R4, 3.38 in R7, and 3.26 in R9. A red line represents the significant threshold values in these different repetitions. For all three repetitions, 10
markers were identified above the threshold value but in some cases these may be hidden behind other markers
consisted of 50,896 SNP markers for 273 accessions [25]. Formula development
In rice, genotypic data consisted of 44,100 SNP markers A formula to determine a significance threshold was de-
for 352 accessions that were obtained from two projects: veloped based on a multiple regression equation that
(1) OryzaSNP project, an oligomer array-based re-se- used two independent variables, broad-sense heritability
quencing effort using Perlegen Sciences technology, and and marker-based heritability, and one response variable,
(2) BAC clone Sanger sequencing of wild species from − log10 (P)-values. Broad-sense heritability was the herit-
the OMAP project [24]. ability that was used to simulate the trait, and marker-
The 45 phenotypic traits were simulated using a R- based heritability was estimated using genetic variance
code script (Additional file 1: Table S2). The simula- determined from a simulated trait and genotypic marker
tions represent nine different combinations of broad data [26] that were obtained from the GAPIT R package
sense heritability (10, 20, 30, 40, 50, 60, 70, 80, and [27]. In the GAPIT package, the MLM model can be de-
90%), and five different combinations of the number scribed as follows: Y = Xβ + Zu + e, where where Y is the
of QTLs associated with the simulated trait (10, 20, vector of observed phenotypes; β is an unknown vector
30, 40, and 50 QTLs). These 45 simulations were re- containing fixed effects, including the genetic marker,
peated 100 times each. population structure (Q), and the intercept; u is an
Table 3 Comparisons of the number of markers identified as Šidák, Hommel, Hochberg, False Discovery Rate, and
significant based upon various criteria Positive False Discovery Rate [11–15] with a signifi-
Crop Trait None MBH Bon FDR cant cut off of 0.05. The GWAS results obtained from
Maize DP 24 11 5 10 compressed mixed linear model (CMLM) and Farm-
ED 19 8 5 6 CPU models were also used in these comparisons.
Soybean C13 12 15 3 3
Additional file
CW 38 13 6 11
Rice SD 11 11 5 8 Additional file 1: Figure S1. Scatter plots between significant threshold
PH 21 17 7 12 and marker-based heritability in maize, soybean, and rice. Table S1.
Phenotypic data of canopy wilting (CW) and carbon isotope ratio (C13)
The column marked ‘None’ represents the selection of significant SNPs at an from 346 soybean accessions previously reported by Kaler et al. (19, 20).
arbitrary threshold value (−Log10 P ≥ 3.5). The column marked MBH represents Table S2. The R code script used for trait simulation for rice data. Similar
the number of markers identified using the marker-based-heritability- programming can be used for other crops by changing the genotypic
regression method. Columns marked Bon and FDR refer to Bonferroni data. (DOCX 176 kb)
corrections and positive False Discovery Rate, respectively, for the number of
significant markers that were selected based on a cutoff of 0.05. Data sets for
these analysis were previously published reports for days to pollinations (DP) Abbreviations
and ear diameter (ED) in maize, carbon isotope ratio (C13) and canopy wilting CW: Canopy wilting; DP: Days to pollination; ED: Ear diameter;
(CW) in soybean, and seeds per panicle (SD) and plant height (PH) in rice GWAS: Genome-wide association study; LD: Linkage disequilibrium;
LM: Linkage mapping; MAF: Minor allele frequency; MLM: Mixed linear
model; PH: Plant height; QTLs: Quantitative trait loci; SD: Seeds per panicle;
unknown vector of random additive genetic effects from SNPs: Single nucleotide polymorphisms
multiple background QTL for individuals/lines; X and Z
Acknowledgements
are the known design matrices; and e is the unobserved
Not applicable.
vector of residuals. The u and e vectors are assumed to
be normally distributed with a null mean and a variance Authors’ contributions
ASK conceived of the idea. ASK and LCP developed and wrote the
u G 0
of: Var ¼ , where G = σ2aK with σ2a as manuscript. Both authors approved of the final manuscript.
e 0 R
the additive genetic variance and K as the kinship Funding
Partial funding for this report was provided by the United Soybean Board,
matrix. Homogeneous variance is assumed for the re- project number 1920–172-0116-A. The funders were not involved in the
sidual effect; i.e., R = σ2eI, where σ2e is the residual vari- planning of this research work, data analysis, or manuscript writing.
ance. The proportion of the total variance explained by
Availability of data and materials
the genetic variance is defined as marker-based
The R code script used for trait simulation in this study is provided using as
heritability. an example the script for rice data. Similar programming can be used for
The response variable was the – log10 (P)-value deter- other crops by changing the genotypic data.
The 346 soybean genotypes used in this study are part of 19,652 G. max and
mined from the association analysis of a simulated trait
G. soja accessions genotyped with SoySNP50K iSelect Beadchip (http://www.
that identified the number of QTLs for that simulated soybase.org/snps/download.php). Additional file 1: Table S1 provides
trait. For example, if a simulated trait had 10 QTLs, then phenotype data for soybean canopy wilting and carbon isotope ratio.
the significant – log10 (P)-value was selected that identi- Similarly, the 279 maize genotypes and 352 rice genotypes are also available
to the public at the website, https://www.panzea.org/data and http://www.
fied these 10 QTLs after performing association analysis ricediversity.org/data/, respectively.
using the FarmCPU model [10]. The FarmCPU is a
multi-locus model that was used for association analysis Ethics approval and consent to participate
Not applicable.
because it performs better than other models in control-
ling false positives and false negatives [19]. Consent for publication
Not applicable.
Competing interests
Validation and comparison of the formula The authors declare that they have no competing interests.
We validated this formula using the GWAS results
from previously-published phenotypic datasets in soy- Received: 10 April 2019 Accepted: 23 July 2019
bean, maize, and rice. The GWAS results were ob-

tained after performing association analysis on the References
datasets including carbon isotope ratio (C13) [20] and 1. Zhu C, Gore M, Buckler ES, Yu J. Status and prospects of association
mapping in plants. Plant Genome. 2008;1(1):5-20. Available from: https://
canopy wilting (CW) [19] in soybean, days to pollin- www.crops.org/publications/tpg/abstracts/1/1/5.
ation (DP) and ear diameter (ED) in maize [23], and 2. Syvanen A-C. Toward genome-wide SNP genotyping. Nat Genet. United
seeds per panicle (SD) and plant height (PH) in rice States; 2005 Jun;37 Suppl:S5–10.
3. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A
[24]. We also compared our formula with different unified mixed-model method for association mapping that accounts for
multiple testing comparisons, including Bonferroni, multiple levels of relatedness. Nat Genet United States. 2006;38(2):203–8.
4. Zhang Z, Ersoz E, Lai C-Q, Todhunter RJ, Tiwari HK, Gore MA, et al. Mixed 23. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES.
linear model approach adapted for genome-wide association studies. Nat TASSEL: software for association mapping of complex traits in diverse
genet [internet]. Nat Publ Group. 2010;42:355. Available from:. https://doi. samples. Bioinformatics [Internet]. 2007;23(19):2633–2635. Available from:
org/10.1038/ng.546. https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/
5. Wen Y-J, Zhang H, Ni Y-L, Huang B, Zhang J, Feng J-Y, et al. Methodological bioinformatics/btm308
implementation of mixed linear models in multi-locus genome-wide 24. Zhao K, Tung C-W, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genome-
association studies. Brief Bioinform [Internet]. 2018;19(4):700–712. Available wide association mapping reveals a rich genetic architecture of complex
from: https://academic.oup.com/bib/article/19/4/700/2965637 traits in Oryza sativa. Nat Commun [Internet]. 2011;2(1):467 Available from:
6. Tamba CL, Ni Y-L, Zhang Y-M. Iterative sure independence screening EM- http://www.nature.com/articles/ncomms1467.
Bayesian LASSO algorithm for multi-locus genome-wide association studies. 25. Wallace JG, Bradbury PJ, Zhang N, Gibon Y, Stitt M, Buckler ES. Association
Komarova NL, editor. PLOS Comput Biol [Internet]. 2017;13(1):e1005357. Mapping across Numerous Traits Reveals Patterns of Functional Variation in
Available from:. https://doi.org/10.1371/journal.pcbi.1005357. Maize. Borevitz JO, editor. PLoS Genet [Internet]. 2014 4;10(12):e1004845.
7. Zhang Y, Liu P, Zhang X, Zheng Q, Chen M, Ge F, et al. Multi-locus Available from: https://doi.org/10.1371/journal.pgen.1004845
genome-wide association study reveals the genetic architecture of stalk 26. Kruijer W, Boer MP, Malosetti M, Flood PJ, Engel B, Kooke R, et al. Marker-
lodging resistance-related traits in maize. Front Plant Sci [Internet. 2018;9 based estimation of heritability in immortal populations. Genetics [Internet].
Available from: http://journal.frontiersin.org/article/10.3389/fpls.2018.00611/ 2015;199(2):379–398. Available from: http://www.genetics.org/lookup/doi/1
full. 0.1534/genetics.114.167916
8. Ren W-L, Wen Y-J, Dunwell JM, Zhang Y-M. pKWmEB: integration of 27. Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, et al. GAPIT: genome
Kruskal–Wallis test with empirical Bayes under polygenic background association and prediction integrated tool. Bioinformatics England. 2012;
control for multi-locus genome-wide association study. Heredity (Edinb) 28(18):2397–9.
[Internet]. 2018;120(3):208–18 Available from: http://www.nature.com/
articles/s41437-017-0007-4.
9. Xu Y, Xu C, Xu S. Prediction and association mapping of agronomic traits in
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in
maize using multiple omic data. Heredity (Edinb) [Internet]. 2017;119(3):174–
published maps and institutional affiliations.
84 Available from: http://www.nature.com/doifinder/10.1038/hdy.2017.27.
10. Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative Usage of Fixed and
Random Effect Models for Powerful and Efficient Genome-Wide Association
Studies. Listgarten J, editor. PLOS Genet [Internet]. 2016 1;12(2):e1005767.
Available from: https://doi.org/10.1371/journal.pgen.1005767
11. Sidak Z. Rectangular confidence regions for the means of multivariate
Normal distributions. J Am Stat Assoc [Internet]. 1967;62(318):626 Available
from: https://www.jstor.org/stable/2283989?origin=crossref.
12. Holm S. A simple sequentially Rejective multiple test procedure. Scand J
Stat. 1979;6:65–70.
13. Hommel G. A Stagewise Rejective multiple test procedure based on a
modified Bonferroni test. Biometrika [Internet]. 1988;75(2):383. Available
from: https://www.jstor.org/stable/2336190?origin=crossref
14. HOCHBERG Y. A sharper Bonferroni procedure for multiple tests of
significance. Biometrika [Internet]. 1988;75(4):800–802. Available from:
https://academic.oup.com/biomet/article-lookup/doi/10.1093/biomet/75.4.
800
15. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical
and powerful approach to multiple testing. J R Stat Soc. 1995;57:289–300.
16. Perneger T V. What’s wrong with Bonferroni adjustments. BMJ [Internet].
1998;316(7139):1236–1238. Available from: http://www.bmj.com/cgi/doi/1
0.1136/bmj.316.7139.1236
17. Buzdugan L, Kalisch M, Navarro A, Schunk D, Fehr E, Bühlmann P. Assessing
statistical significance in multivariable genome wide association analysis.
Bioinformatics [Internet]. 2016;32(13):1990–2000. Available from: https://
academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/
btw128
18. Tenaillon MI, Sawkins MC, Long AD, Gaut RL, Doebley JF, Gaut BS. Patterns
of DNA sequence polymorphism along chromosome 1 of maize (Zea mays
ssp. mays L.). Proc Natl Acad Sci U S A United States. 2001;98(16):9161–6.
19. Kaler AS, Ray JD, Schapaugh WT, King CA, Purcell LC. Genome-wide
association mapping of canopy wilting in diverse soybean genotypes. Theor
Appl Genet [Internet]. 2017;130(10):2203–2217. Available from: http://link.
springer.com/10.1007/s00122-017-2951-z
20. Kaler AS, Dhanapal AP, Ray JD, King CA, Fritschi FB, Purcell LC. Genome-
wide association mapping of carbon isotope and oxygen isotope ratios in
diverse soybean genotypes. Crop Sci [Internet]. 2017;57(6):3085. Available
from: https://dl.sciencesocieties.org/publications/cs/abstracts/57/6/3085
21. Kaler AS, Ray JD, Schapaugh WT, Asebedo AR, King CA, Gbur EE, et al.
Association mapping identifies loci for canopy temperature under drought
in diverse soybean genotypes. Euphytica [Internet]. 2018;214(8):135.
Available from: http://link.springer.com/10.1007/s10681-018-2215-2
22. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, et al. Genome-wide
association studies of 14 agronomic traits in rice landraces. Nat genet
[internet]. Nature publishing group, a division of Macmillan publishers
limited. All Rights Reserved; 2010;42:961. Available from: https://doi.org/10.1
038/ng.695.

Estimation of A Significance Threshold For Genome-Wide Association Studies

Uploaded by

Copyright:

Available Formats

Estimation of A Significance Threshold For Genome-Wide Association Studies

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Estimation of A Significance Threshold For Genome-Wide Association Studies

Uploaded by

Copyright:

Available Formats

Kaler and Purcell BMC Genomics (2019) 20:618

METHODOLOGY ARTICLE Open Access

Estimation of a significance threshold for

Background to consider population structure and family relatedness

bean, maize, and rice. The GWAS results were ob-

You might also like