The Quest For An All-Inclusive Human Genome

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Feature

THE QUEST FOR


AN ALL-INCLUSIVE
HUMAN GENOME
Efforts are under way to create a ‘pangenome’
that would catalogue almost all human genetic
diversity. But not everyone is ready to sign on.
By Roxanne Khamsi

S
everal years ago, after an exhaustive recruited through a newspaper advertisement capture almost all human genetic variability
search for uncharted variation in the in Buffalo, New York; a whopping 70% of the — the dizzying number of genetic remixes in
human genome, Evan Eichler stum- DNA comes from just one man. the human species, including additions, dele-
bled on something extraordinary. By 2003, that reference genome, known as tions and other types of mutation.
Eichler, a geneticist at the University GRCh38, would be deemed technically com- Rather than depicting the genome as a lin-
of Washington in Seattle, and his col- plete, but it still had hundreds of gaps and ear readout from a single individual, it would
leagues struck on a massive stretch sections containing copious errors. These contain multiple paths branching in and out
of DNA, about 400,000 letters long, shortcomings came with consequences. like the tangle of train lines on the map for the
that contained extra copies of genes — proba- Eichler worked with clinical geneticists at his London Underground. These would represent
bly passed on from an ancient hominin group university’s medical centre and found that the the varieties of sequence that can be found in
known as the Denisovans1. It appeared in about reference genome lacks a region that has vari- different populations, such as the long stretch
80% of people living in Papua New Guinea, but ants associated with Baratela-Scott syndrome, of DNA found in many people from Papua New
practically nowhere else. which can cause cognitive delays and skeletal Guinea.
“We were shocked by the size,” Eichler says. malformations in children. Because that por- In 2019, Eichler and his colleagues started
“We always knew there would be archaic seg- tion was missing, there was no quick way for the Human Pangenome Project, a $30-mil-
ments in our genome.” But the segment’s the physicians to check for DNA errors there. lion effort funded by the US National Human
length and its absence in much of the world, Genome maps have improved, but still don’t Genome Research Institute (NHGRI) in
he says, “transformed our thinking”. adequately capture humanity’s vast diversity. Bethesda, Maryland. The initial goal is to do
This and other unexpected discoveries have For example, in 2018, one group of researchers detailed, reference-quality genome sequenc-
made Eichler and other geneticists increas- sequenced 910 individuals of African descent ing of about 350 people from different back-
ingly dissatisfied with the breadth and depth and discovered a sequence consisting of 300 grounds and to share those data as freely as
ILLUSTRATION BY ANA KOVA

of the available maps of the human genome. million DNA letters, or bases, that was unfamil- possible.
The first draft genome from the US$2.7-billion iar2. That’s roughly 10% of the entire genome. The effort will pose a significant technical
Human Genome Project, released in 2001, was To create a reference that is more complete challenge, but the scientists behind it, includ-
meant to become a reference point for future and more representative, Eichler has joined ing Karen Miga at the University of California,
genetic research. But 93% of its sequence came forces with a number of high-profile scientists, Santa Cruz, and Ting Wang at the Washing-
from just 11 individuals, many of whom were mostly in the United States. Their goal is to ton University School of Medicine in St. Louis,

378 | Nature | Vol 603 | 17 March 2022


©
2
0
2
2
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
throughout the project, not just at periodic of the microbe’s genetic nuances4. What they
junctures, as was done by initiatives in the produced was a core genome shared by all six
past. “They are not a separate entity working strains and a “dispensable” genome of partially

MAKING HUMAN
in silo, they are involved in every step of the shared and strain-specific genes.
project, including all the technical decisions,” It was a tricky task, because bacteria swap

VARIATION INTUITIVE
Wang says. and share bits of DNA, even with other species,
Nevertheless, some geneticists focused mostly through a process known as horizontal

AND EASY TO
on the needs of Indigenous communities gene transfer. “There’s a lot of things that can
are wary of the initiative. They aren’t calling happen in bacteria,” says Candice Hirsch, a

UNDERSTAND IS PART for an end to the Human Pangenome Project


per se, but they say that marginalized groups
plant geneticist at the University of Minnesota
in Saint Paul. As a result, biologists are continu-

OF OUR MISSION.” deserve control of their genetic data, and of


the sequencers, too. “As we position ourselves
to be in control of these technologies, we’re
ally updating the bacterial reference genomes.
Humans, by contrast, do not add new variation
as easily. That makes characterizing a human
argue it is worth it. They see it as crucial to empowering our communities,” explains pangenome more feasible, Hirsch says.
making genomic medicine more equitable3. Keolu Fox, a geneticist at the University of But what it lacks in dynamics, the human
“To account for diversity is to better serve California, San Diego, who is Native Hawaiian. genome makes up for in length and repetition.
humanity,” Wang says. “It is about both equity “Nothing is as real deal as we are. We’re from Chromosome 1, for example, the largest of the
and equality. It is about building a more inclu- our communities.” 24 different human chromosomes, stretches
sive genomic resource for humankind.” over about 250 million base pairs. That’s more
The researchers in the pangenome effort are Panning out than 100 times the length of S. agalactiae.
aware of the history of past missions to capture The concept of a pangenome traces back to the And it is riddled with long stretches of sim-
human genetic diversity, some of which were study of a bacterium known as Streptococcus ple, repeated sequences and duplications of
seen as ‘vampire’ projects that took data from agalactiae, or group B streptococcus, which other, more complex segments. Until the past
marginalized populations and failed to respect can cause deadly infections in newborns. Sci- decade, scientists’ main option for sequenc-
their needs and wishes. In response to this, entists analysing six strains of the bacterium ing DNA involved breaking it into fragments
the pangenome effort engages bioethicists published a paper in 2005 trying to capture all and reading it in small chunks. This allows

Nature | Vol 603 | 17 March 2022 | 379


©
2
0
2
2
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Feature
them to detect single-letter changes in DNA to visualize the diversity and showcase the var- sequencing project. The consent forms that
relatively easily. But the short reads make it iations. Up until now, including for the GRCh38 those individuals signed years ago also cover
hard to recognize when a long stretch of DNA reference genome, the convention has been the use of their DNA data for the new project.
contains more than one copy of a gene. Eichler, to have a simple linear representation and a But the Human Pangenome Project is taking
who has specialized in identifying structural companion database with variations listed for further measures to ensure ethical collection
variants such as gene duplications and dele- different positions in the sequence, such as and use of genetic data. In contrast to other
tions, has opted for a newer approach, called single-letter changes. “The community has major genetic sequencing efforts, in which
‘long-read sequencing’, which analyses bigger used this convenient fiction of linear reference scientists made decisions and then only had
stretches of DNA at a time. This is what enabled sequence for 20 years,” says Benedict Paten, a them vetted by an Institutional Review Board,
him to find the previously unnoticed variant computational biologist at the University of for example, the Human Pangenome Project
in people from Papua New Guinea. California, Santa Cruz. Paten, whose office is has social ethicists who are “embedded” in the
In 2018, Eichler and other scientists gath- next to Miga’s, is collaborating with a group decision-making process and continuously
ered at the NHGRI to discuss a human pange- to improve the sophistication of the pange- vetting the project, Eichler says.
nome effort. There, Eichler reconnected with nome visualization. In this new visualization, As Wang puts it: “It’s really about how to
a fellow scientist who shared his passion for coloured lines represent distinct variants. guide the nerdy scientists who may not think
long-read technology, Erich Jarvis, a neurosci- More-frequent variations are indicated with about social issues to do their science in the
entist and molecular biologist at Rockefeller thicker lines. “Making human variation intui- most appropriate manner.”
University in New York City. tive and easy to understand is part of our mis- In many ways, the leaders of the pangenome
“We kept raising our hands and saying, sion in integrating the pangenome,” Paten says project are trying to overcome the ethically
‘You’re not going to be able to do that unless (see ‘Visualizing a pangenome’). thorny legacy of past endeavours. The Human
you have high-quality reference genomes,’” Genome Diversity Project, for example,
Jarvis recalls. But long-read sequencing Missteps and departures launched in 1991 as an effort to collect DNA
would require more money, and not every- Many of the 350 people whose genomes will information from people around the globe,
one was keen to deploy it. Jarvis recalls feel- be analysed in the Human Pangenome Pro- engendered staunch opposition from several
ing frustrated by some of the debates. “I even ject participated in the 1000 Genomes Pro- communities. Indigenous groups, among
chipped a little bit of my front tooth on a fork ject, an effort launched in 2008 to catalogue others, felt they were being treated as living
at a restaurant. I was biting on it so hard,” he common and rare variants from 26 diverse fossils, headed towards extinction6.
says. Ultimately, he and others pushing for the populations. The DNA samples that were col- “Scientists were collecting Indigenous peo-
long-read approach won. lected as part of that effort will be retrieved ples’ genomic data largely for the benefit of
Miga, who brings to the project a reputation from cold storage and repurposed for the other, non-Indigenous peoples, which, when
for completing difficult-to-read sections of more detailed long-reads of the pangenome done without regards to Indigenous data
DNA, was already using long-read technology.
She, along with Jarvis, Eichler and others, pub-
lished the first-ever completely sequenced
VISUALIZING A PANGENOME
The Human Pangenome Project aims to capture all of the variability in the human genome around the world.
human genome, capturing all 3 billion letters, By analysing this variation and creating innovative ways to display it, the effort counters the assumption
that there is a consensus of what a human genome looks like.
including the messy, highly repetitive sections
that cap the ends of chromosomes — known as Gathering samples
Researchers will have to produce high-quality sequences for hundreds of individuals and catalogue
telomeres5. This first telomere-to-telomere the variants, including single-letter changes, insertions, deletions and inversions.
genome sequence corrected numerous errors Insertion/deletion
from previous references and uncovered Sequences
around 100 unnoticed genes that probably
code for proteins. Variation DNA base Inversion
It was no simple feat, however. Typically,
human cells contain two sets of 23 chromo-
somes — one from an egg and one from a
sperm cell. But
Immune duplicated sequences and
response
otherboosts enzyme
structural DNA variations get jumbled Single-letter
that produces change
up when machines try to read both sets at the Visualizing variation
amyloid-β
Graphical models can present variation data in a way that doesn’t
same time. To circumvent this, the scientists assume a standard, or default reference genome.
analysed the DNA of a cell line derived from
Insertion/deletion Single-letter change Inversion
what’s known as a molar pregnancy, in which
a sperm fertilizes an egg with no nucleus. The
DNA contained only one set of chromosomes.
The 350 genomes for the Human Pange- Shared
sequence
nome Project, by contrast, will come from
diploid cell lines, that is, cells that contain Exploring the pangenome
Representations that look like subway maps allow researchers to
copies from both parents, so scientists will compare the variations in a population at a sequence level.
have to use complex computational tools to Shared sequence
tease the genomes apart and make sure they
capture the structural variation accurately.
The pangenome effort has already com-
pleted around 70 detailed genomes. It aims
to finish telomere-to-telomere versions of all
350 by the end of the grant, in mid-2024.
And scientists are already working on ways Insertion/deletion Single-letter change Inversion

380 | Nature | Vol 603 | 17 March 2022


©
2
0
2
2
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
sovereignty, is a means of continued data to happen, it needs to happen in the best way
extraction,” says Krystal Tsosie, a geneticist that represents Indigenous people,” she says.
and bioethicist at Vanderbilt University in It’s not just advocates from Indigenous com-

IF IT’S GOING TO
Nashville, Tennessee, and a member of the munities in the United States who have voiced
Navajo Nation. concerns about representation and data own-

HAPPEN, IT NEEDS TO
The next decade brought even more concern ership. Others have argued that the pange-
over ethical transgressions in genetic studies nome project hasn’t adequately involved
of under-represented groups, notably when
the Havasupai Tribe filed a lawsuit against the HAPPEN IN THE BEST researchers from regions outside the United
States, according to Jarvis, who is on the pro-
Arizona Board of Regents and Arizona State
University researchers in 2004. Members of WAY THAT REPRESENTS ject’s sampling committee. He recognizes that
some see the initiative as a largely US effort,
the tribe had donated their DNA for genetic
studies on type 2 diabetes, but discovered
that it had been used without their consent
INDIGENOUS PEOPLE.” but says that he and his collaborators are work-
ing to broaden it and involve scientists and
participants from different parts of the world.
for studies on schizophrenia and migration7. For example, they have reached out to leaders
The researchers had also used stigmatizing of the Human Heredity and Health in Africa
words such as ‘inbreeding’ to explain genetic (H3Africa) programme to involve scientists
phenomena that were actually the conse- in Africa who can do sequencing in countries
quence of population bottlenecks related to there. (No sequencing effort seems immune
genocidal events, says Tsosie. She adds that, from ethical challenges, however — even the
in the past, geneticists doing sequencing pro- H3Africa programme has had to straddle dif-
jects have often used racial language and failed ferent countries’ rules and norms governing
TOMÁS KARMELO ​A MAYA/NEW YORK TIMES/REDUX/EYEVINE

to properly acknowledge the lasting legacy of the use of participant data, for example.)
colonialism in science, and the threat it poses Jarvis says he wants the Human Pangenome
to Indigenous people. Project to achieve a better representation of
For several years, Fox and others have been human genetic diversity. “I’m a person of col-
calling for a massive departure from this our. I grew up as an African American. I grew
approach. They say that Indigenous groups up as an under-represented minority in the
should have greater agency when it comes to sciences,” he says. “My diversity is not repre-
the collection of their genetic data. Fox, who sented. So I have a personal motivation and a
was a graduate student in Eichler’s lab, says societal one to make sure that this pangenome
that he’s not convinced that the pangenome really represents populations.”
project and others like it are involving the As they push forwards, the scientists also
diverse groups they seek to sample in a way acknowledge that 350 genomes will not rep-
that truly empowers them. “I love Evan, man. Geneticist Krystal Tsosie. resent all human diversity. Ultimately, the true
When I have problems, I call him for advice,” he number of genomes needed to do this is diffi-
says. “Despite that, you know, we don’t agree greater autonomy about whom they allow to cult to pin down, and genetics often teaches us
on everything.” access and use their information. “There are that rare differences can be important. “I don’t
Fox advocates for an approach that puts so many advancements in the data sciences think there is any magic number,” says Adam
sequencing power in the hands of the peo- right now that really allow for a new level of Phillippy, head of the Genome Informatics
ple. He and Tsosie are involved in the Native agency for participants,” Fox says. Section at the NHGRI, and an investigator on
BioData Consortium, a non-profit research Eichler is supportive of Fox’s path. “I the pangenome project.
institute led by Indigenous scientists and applaud his efforts to engage Indigenous sci- Juggling the massive scientific undertaking
tribal members in the United States that has entists into genomics research — we need more while trying to avoid ethical pitfalls is some-
been working to help Indigenous groups to of it,” Eichler says. “It is not an either–or sce- thing that weighs heavily on the pangenome
acquire and run DNA sequencers on their own nario, however, in my opinion.” He adds that researchers. “I’m sure there will be things that
territory. The first sequencer was delivered the Human Pangenome Project is encouraging we will do that people will criticize five or ten
to the Cheyenne River Sioux reservation in Indigenous scientists to generate their own years from now. I’m almost 100% sure of it,”
December 2020 says consortium co-founder reference genomes. In those scenarios, “we will Eichler says. “But if we can go in with a clear
Joseph Yracheta, a public-health geneticist at work together to make it happen by providing conscience and say, we tried to do everything
the Johns Hopkins Bloomberg School of Pub- expertise and tools as needed”. we possibly could to do it right, I feel that that’s
lic Health in Baltimore. In February, Yracheta something.”
joined a Human Pangenome Project working No mutation without
group focused on ethical, legal and social representation Roxanne Khamsi is a science journalist based
implications of the project. Tsosie says that Indigenous groups might in Montreal.
Fox is currently focused on genetic com- collaborate with big diversity projects in the
plexity found in the Pacific islands. He and future, but that it would have to happen in a 1. Hsieh, P. et al. Science 366, eaax2083 (2019).
2. Sherman, R. M. et al. Nature Genet. 51, 30–35 (2019).
his team mates are taking a holistic approach way that would ensure that such communi- 3. Miga, K. H. & Wang, T. et al. Annu. Rev. Genom. Hum.
to sequence the genomes of agricultural spe- ties can do their own sequencing. Moreover, Genet. 22, 81–102 (2021).
cies and other organisms in the environment although these major genome projects are 4. Tettelin, H. et al. Proc. Natl Acad. Sci. USA 102, 13950–
13955 (2005).
in tandem, and are building a genomics insti- often open-data efforts, Tsosie says it would 5. Nurk, S. et al. Preprint at bioRxiv https://doi.
tute to serve the community. Fox notes that be wise for there to be protections added for org/10.1101/2021.05.26.445798 (2021).
6. Dodson, M. & Williamson, R. J. Med. Ethics 25, 204–208
the latest technologies, such as a ‘distributed Indigenous people’s deposited DNA sequences
(1999).
ledger’ computer system that securely ties a such that they be available only through access 7. Garrison, N. A. et al. Sci. Technol. Hum. Values 38,
person to their genetic data, can give people requests to avoid exploitation. “If it’s going 201–223 (2013).

Nature | Vol 603 | 17 March 2022 | 381


©
2
0
2
2
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.

You might also like