Where does the data come from?

Genotype-phenotype associations, along with study information, are manually extracted from the literature and entered into the GWAS Catalog. This information is curated by experts, and then made freely available and searchable via our website to allow scientists to interpret the data accurately (Figure 5).

Figure 5 Flow diagram to show how studies are included in the GWAS Catalog.
Figure 5 Flow diagram to show how studies are included in the GWAS Catalog.

Eligibility Criteria

Studies are included in the GWAS Catalog if they carry out genome wide array or sequencing-based genotyping and association analysis of ≥100,000 SNPs. This includes previously published GWAS which are incorporated into new analyses (meta-analyses). Further details on Catalog eligibility can be found here. Note that although in most cases the variants typed by SNP-arrays are single nucleotide variants, indels may also be typed.

For each of these studies, variant-trait associations are included if:

  • They have a p-value <1.0 x 10-5 in the overall (initial GWAS + replication) population
  • The most significant variant from each independent locus is extracted

Which data from these studies are included in the GWAS Catalog?

For each study that we curate, we extract data to describe the study design and allow accurate interpretation of the GWAS, along with eligible results, as described above. The diagram below (Figure 6) summarises the data that we extract and enter into the Catalog:

Figure 6 Summary of data that is extracted and entered into the GWAS Catalog.
Figure 6 Summary of data that is extracted and entered into the GWAS Catalog.

Each of these datatypes is discussed in more detail in the guided example.